﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-I believe-随笔分类-Python</title><link>http://www.cppblog.com/luyulaile/category/19897.html</link><description>I  can</description><language>zh-cn</language><lastBuildDate>Tue, 04 Dec 2012 08:21:22 GMT</lastBuildDate><pubDate>Tue, 04 Dec 2012 08:21:22 GMT</pubDate><ttl>60</ttl><item><title>Python extract all comments:提取所有comments,提取c/c++中注释Python脚本</title><link>http://www.cppblog.com/luyulaile/archive/2012/12/03/195907.html</link><dc:creator>luis</dc:creator><author>luis</author><pubDate>Mon, 03 Dec 2012 00:35:00 GMT</pubDate><guid>http://www.cppblog.com/luyulaile/archive/2012/12/03/195907.html</guid><wfw:comment>http://www.cppblog.com/luyulaile/comments/195907.html</wfw:comment><comments>http://www.cppblog.com/luyulaile/archive/2012/12/03/195907.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/luyulaile/comments/commentRss/195907.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/luyulaile/services/trackbacks/195907.html</trackback:ping><description><![CDATA[注意，我们只是简单的提取// 以及/* */之间的内容。<br />如果 程序中出现了&#8220;/*&#8221;,会有bug<br /><br /><div style="background-color: #eeeeee; font-size: 13px; border: 1px solid #cccccc; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #008000; ">#</span><span style="color: #008000; ">!/usr/bin/env&nbsp;python</span><span style="color: #008000; "><br /></span><br /><span style="color: #0000FF; ">import</span>&nbsp;sys<br /><span style="color: #0000FF; ">import</span>&nbsp;re<br /><br /><span style="color: #0000FF; ">def</span>&nbsp;comment_finder(text):<br />&nbsp;&nbsp;&nbsp;&nbsp;pattern&nbsp;=&nbsp;re.compile(&nbsp;r<span style="color: #800000; ">'</span><span style="color: #800000; ">//.*?$|/\*.*?\*/</span><span style="color: #800000; ">'</span>,&nbsp;re.DOTALL&nbsp;|&nbsp;re.MULTILINE)<br />&nbsp;&nbsp;&nbsp;&nbsp;result&nbsp;=&nbsp;pattern.findall(text)<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">return</span>&nbsp;result<br /><br /><span style="color: #0000FF; ">def</span>&nbsp;print_command(filename):<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;codefile&nbsp;=&nbsp;open(filename,<span style="color: #800000; ">'</span><span style="color: #800000; ">r</span><span style="color: #800000; ">'</span>)<br />&nbsp;&nbsp;&nbsp;&nbsp;commentfile&nbsp;=&nbsp;open(filename+<span style="color: #800000; ">"</span><span style="color: #800000; ">.txt</span><span style="color: #800000; ">"</span>,<span style="color: #800000; ">'</span><span style="color: #800000; ">w</span><span style="color: #800000; ">'</span>)<br />&nbsp;&nbsp;&nbsp;&nbsp;lines=codefile.read()<br />&nbsp;&nbsp;&nbsp;&nbsp;codefile.close()<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">the&nbsp;list&nbsp;of&nbsp;comments</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;list_of_comments&nbsp;=&nbsp;comment_finder(lines)<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">for</span>&nbsp;comment&nbsp;<span style="color: #0000FF; ">in</span>&nbsp;list_of_comments:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">print&nbsp;comment[0:2]</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">if</span>&nbsp;comment[0:2]&nbsp;==&nbsp;<span style="color: #800000; ">"</span><span style="color: #800000; ">//</span><span style="color: #800000; ">"</span>:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;comment_to_write&nbsp;=&nbsp;comment[2:]<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">else</span>:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;comment_to_write&nbsp;=&nbsp;comment[2:-2]<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">if</span>&nbsp;len(comment_to_write)!=0:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;commentfile.write(comment_to_write)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;commentfile.write(<span style="color: #800000; ">'</span><span style="color: #800000; ">\n</span><span style="color: #800000; ">'</span>)<br />&nbsp;&nbsp;&nbsp;&nbsp;commentfile.close()<br /><br /><span style="color: #0000FF; ">if</span>&nbsp;<span style="color: #800080; ">__name__</span>&nbsp;==&nbsp;<span style="color: #800000; ">"</span><span style="color: #800000; ">__main__</span><span style="color: #800000; ">"</span>:<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">for</span>&nbsp;filename&nbsp;<span style="color: #0000FF; ">in</span>&nbsp;sys.argv[1:]:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print_command(filename)</div>使用：<br /><br /><div>在linux下面 转到当前目录 ./get_comment.py *<br />或者 指定文件类型<br /><div>./get_comment.py *.c</div><div></div></div><div></div><img src ="http://www.cppblog.com/luyulaile/aggbug/195907.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/luyulaile/" target="_blank">luis</a> 2012-12-03 08:35 <a href="http://www.cppblog.com/luyulaile/archive/2012/12/03/195907.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Python 笔记 pi  tan 等公式</title><link>http://www.cppblog.com/luyulaile/archive/2012/11/08/194861.html</link><dc:creator>luis</dc:creator><author>luis</author><pubDate>Thu, 08 Nov 2012 00:21:00 GMT</pubDate><guid>http://www.cppblog.com/luyulaile/archive/2012/11/08/194861.html</guid><wfw:comment>http://www.cppblog.com/luyulaile/comments/194861.html</wfw:comment><comments>http://www.cppblog.com/luyulaile/archive/2012/11/08/194861.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/luyulaile/comments/commentRss/194861.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/luyulaile/services/trackbacks/194861.html</trackback:ping><description><![CDATA[<div style="text-align: justify;"><div>import math<br />def mianji(n,s):</div><div>&nbsp; &nbsp; temp=1/4*n*(s**2)/math.tan(math.pi/n)</div><div>&nbsp; &nbsp; return temp</div><div>print mianji(5,7)<br />============<br />使用时math.pi math.tan</div></div><img src ="http://www.cppblog.com/luyulaile/aggbug/194861.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/luyulaile/" target="_blank">luis</a> 2012-11-08 08:21 <a href="http://www.cppblog.com/luyulaile/archive/2012/11/08/194861.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Python 笔记2 //</title><link>http://www.cppblog.com/luyulaile/archive/2012/11/08/194858.html</link><dc:creator>luis</dc:creator><author>luis</author><pubDate>Wed, 07 Nov 2012 22:43:00 GMT</pubDate><guid>http://www.cppblog.com/luyulaile/archive/2012/11/08/194858.html</guid><wfw:comment>http://www.cppblog.com/luyulaile/comments/194858.html</wfw:comment><comments>http://www.cppblog.com/luyulaile/archive/2012/11/08/194858.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/luyulaile/comments/commentRss/194858.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/luyulaile/services/trackbacks/194858.html</trackback:ping><description><![CDATA[Python &nbsp;&#8220;&#8221; &#8216;&#8217; 都可以表示字符串&nbsp;<br /><br /><h5>s1 = "hello,world"&nbsp;<br />如果要写成多行，那么就要使用/ (&#8220;连行符&#8221;)吧，如&nbsp;<br />s2 = "hello,/&nbsp;<br />world"&nbsp;<br />s2与s1是一样的。如果你用3个双引号的话，就可以直接写了，如下：&nbsp;<br />s3 = """hello,&nbsp;<br />world,&nbsp;<br />hahaha."""，那么s3实际上就是"hello,/nworld,/nhahaha.", 注意&#8220;/n&#8221;</h5><br /><strong style="font-family: Arial; line-height: 26px; background-color: #ffffff; ">s5 = "Let's go"&nbsp;</strong>&nbsp;<br /><strong style="font-family: Arial; line-height: 26px; background-color: #ffffff; ">s4 = 'Let/'s go'</strong>&nbsp;<br /><br />我们也可以把''' &nbsp;''' 作为多行注释<br /><br /><span style="color: #222222; font-family: arial, sans-serif; font-size: small; line-height: 17.77777862548828px; background-color: #ffffff; ">str(object)</span>&nbsp;可以将所有转化为字符串。<br /><br /><br /><table width="496" style="color: #becdcd; font-family: Helvetica, Tahoma, Arial, sans-serif; font-size: 14px; line-height: 25.200000762939453px; text-align: left; background-color: #3c4e4e; width: 496px; height: 323px; "><tbody><tr><td style="font-size: 1em; "><strong>python</strong></td><td style="font-size: 1em; "><strong>java</strong></td><td style="font-size: 1em; "><strong>描述</strong></td></tr><tr><td style="font-size: 1em; ">or</td><td style="font-size: 1em; ">||</td><td style="font-size: 1em; ">逻辑或</td></tr><tr><td style="font-size: 1em; ">and</td><td style="font-size: 1em; ">&amp;&amp;</td><td style="font-size: 1em; ">逻辑与</td></tr><tr><td style="font-size: 1em; ">not</td><td style="font-size: 1em; ">！</td><td style="font-size: 1em; ">逻辑非</td></tr><tr><td style="font-size: 1em; ">&lt;，&gt;，&lt;=，&gt;=，==，!=或&lt;&gt;</td><td style="font-size: 1em; ">&lt;，&gt;，&lt;=，&gt;=，==，!=</td><td style="font-size: 1em; ">比较操作</td></tr><tr><td style="font-size: 1em; ">is，is not</td><td style="font-size: 1em; ">instanceof</td><td style="font-size: 1em; ">身份认证</td></tr><tr><td style="font-size: 1em; ">|</td><td style="font-size: 1em; ">|</td><td style="font-size: 1em; ">位或</td></tr><tr><td style="font-size: 1em; ">&amp;</td><td style="font-size: 1em; ">&amp;</td><td style="font-size: 1em; ">位与</td></tr><tr><td style="font-size: 1em; ">^</td><td style="font-size: 1em; ">^</td><td style="font-size: 1em; ">位异或</td></tr><tr><td style="font-size: 1em; ">&lt;&lt;，&gt;&gt;</td><td style="font-size: 1em; ">&lt;&lt;，&gt;&gt;</td><td style="font-size: 1em; ">移位</td></tr><tr><td style="font-size: 1em; ">+，-，*，/</td><td style="font-size: 1em; ">+，-，*，/</td><td style="font-size: 1em; ">加减乘除</td></tr><tr><td style="font-size: 1em; ">%</td><td style="font-size: 1em; ">%</td><td style="font-size: 1em; ">余数</td></tr><tr><td style="font-size: 1em; ">~</td><td style="font-size: 1em; ">~</td><td style="font-size: 1em; ">位取补</td></tr></tbody></table><br /><br />//运算符 &nbsp;<br />10/3==3<br />120//10==12<br />121//10==12<br />122//10==12<br />130//10==13<br />10//3.0==3.0<br /><br /><span style="color: #333333; font-family: tahoma, 宋体; line-height: 22px; background-color: #ffffff; ">A new operator, //, is the floor division operator. (Yes, we know it&nbsp;</span><br style="color: #333333; font-family: tahoma, 宋体; line-height: 22px; background-color: #ffffff; " /><span style="color: #333333; font-family: tahoma, 宋体; line-height: 22px; background-color: #ffffff; ">looks like C++'s comment symbol.) // always performs floor division no&nbsp;</span><br style="color: #333333; font-family: tahoma, 宋体; line-height: 22px; background-color: #ffffff; " /><span style="color: #333333; font-family: tahoma, 宋体; line-height: 22px; background-color: #ffffff; ">matter what the types of its operands are, so 1 // 2 is 0 and 1.0 //&nbsp;</span><br style="color: #333333; font-family: tahoma, 宋体; line-height: 22px; background-color: #ffffff; " /><span style="color: #333333; font-family: tahoma, 宋体; line-height: 22px; background-color: #ffffff; ">2.0 is also 0.0.</span>&nbsp;<br /><br />not ()<br /><img src ="http://www.cppblog.com/luyulaile/aggbug/194858.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/luyulaile/" target="_blank">luis</a> 2012-11-08 06:43 <a href="http://www.cppblog.com/luyulaile/archive/2012/11/08/194858.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>label switching</title><link>http://www.cppblog.com/luyulaile/archive/2012/10/31/194109.html</link><dc:creator>luis</dc:creator><author>luis</author><pubDate>Tue, 30 Oct 2012 19:36:00 GMT</pubDate><guid>http://www.cppblog.com/luyulaile/archive/2012/10/31/194109.html</guid><wfw:comment>http://www.cppblog.com/luyulaile/comments/194109.html</wfw:comment><comments>http://www.cppblog.com/luyulaile/archive/2012/10/31/194109.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/luyulaile/comments/commentRss/194109.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/luyulaile/services/trackbacks/194109.html</trackback:ping><description><![CDATA[label switching:<br />比如有两箱苹果： 第一个箱子内有两个苹果，label 为a 的概率为30%,为b的概率70%，第二个箱子内有四个苹果，label为b的概率40%,label 为a的概率60%.<br />如果我们求所有的苹果的重量，只需要将所有的箱子内的苹果取出来求重量即刻。<br />但是我们先求label a的箱子苹果的重量，加上label b的箱子苹果的重量，可能出现两次取的是同一个箱子，这就是label switching问题。<img src ="http://www.cppblog.com/luyulaile/aggbug/194109.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/luyulaile/" target="_blank">luis</a> 2012-10-31 03:36 <a href="http://www.cppblog.com/luyulaile/archive/2012/10/31/194109.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Python generate corpus using Dirichlet distribution</title><link>http://www.cppblog.com/luyulaile/archive/2012/10/28/193960.html</link><dc:creator>luis</dc:creator><author>luis</author><pubDate>Sun, 28 Oct 2012 02:13:00 GMT</pubDate><guid>http://www.cppblog.com/luyulaile/archive/2012/10/28/193960.html</guid><wfw:comment>http://www.cppblog.com/luyulaile/comments/193960.html</wfw:comment><comments>http://www.cppblog.com/luyulaile/archive/2012/10/28/193960.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/luyulaile/comments/commentRss/193960.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/luyulaile/services/trackbacks/193960.html</trackback:ping><description><![CDATA[<div>At first, let's define the sample function:<br /><br /><div style="background-color: #eeeeee; font-size: 13px; border: 1px solid #cccccc; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all; "><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000FF; ">def</span>&nbsp;sample(dist,&nbsp;num_samples=1):<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #800000; ">"""</span><span style="color: #800000; "><br />&nbsp;&nbsp;&nbsp;&nbsp;Uses&nbsp;the&nbsp;inverse&nbsp;CDF&nbsp;method&nbsp;to&nbsp;return&nbsp;samples&nbsp;drawn&nbsp;from&nbsp;an<br />&nbsp;&nbsp;&nbsp;&nbsp;(unnormalized)&nbsp;discrete&nbsp;distribution.<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;Arguments:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;dist&nbsp;--&nbsp;(unnormalized)&nbsp;distribution<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;Keyword&nbsp;arguments:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;num_samples&nbsp;--&nbsp;number&nbsp;of&nbsp;samples&nbsp;to&nbsp;draw<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #800000; ">"""</span><br /><br />&nbsp;&nbsp;&nbsp;&nbsp;cdf&nbsp;=&nbsp;cumsum(dist)<br />&nbsp;&nbsp;&nbsp;&nbsp;r&nbsp;=&nbsp;uniform(size=num_samples)&nbsp;*&nbsp;cdf[-1]<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">return</span>&nbsp;cdf.searchsorted(r)</div>As we can see, the sample function input two parameters, one is dist, which can be an un-normalized distribution, another is the sample we want to draw.<br /><br />Let's see how to generate corpus for&nbsp;<strong>Dirichlet--multinomial unigram language model</strong><br /><div style="background-color: #eeeeee; font-size: 13px; border: 1px solid #cccccc; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all; "><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000ff; ">def</span>&nbsp;generate_corpus(beta,&nbsp;mean,&nbsp;N):<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #800000; ">"""</span><span style="color: #800000; "><br />&nbsp;&nbsp;&nbsp;&nbsp;Returns&nbsp;a&nbsp;corpus&nbsp;of&nbsp;tokens&nbsp;drawn&nbsp;from&nbsp;a&nbsp;Dirichlet--multinomial<br />&nbsp;&nbsp;&nbsp;&nbsp;unigram&nbsp;language&nbsp;model.&nbsp;Each&nbsp;token&nbsp;is&nbsp;an&nbsp;instance&nbsp;of&nbsp;one&nbsp;of&nbsp;V<br />&nbsp;&nbsp;&nbsp;&nbsp;unique&nbsp;word&nbsp;types,&nbsp;represented&nbsp;by&nbsp;indices&nbsp;0,&nbsp;<img src="http://www.cppblog.com/Images/dot.gif" alt="" />,&nbsp;V&nbsp;-&nbsp;1.<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;Arguments:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;beta&nbsp;--&nbsp;concentration&nbsp;parameter&nbsp;for&nbsp;the&nbsp;Dirichlet&nbsp;prior<br />&nbsp;&nbsp;&nbsp;&nbsp;mean&nbsp;--&nbsp;V-dimensional&nbsp;mean&nbsp;of&nbsp;the&nbsp;Dirichlet&nbsp;prior<br />&nbsp;&nbsp;&nbsp;&nbsp;N&nbsp;--&nbsp;number&nbsp;of&nbsp;tokens&nbsp;to&nbsp;generate<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #800000; ">"""</span><br /><br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">pass</span>&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">&nbsp;YOUR&nbsp;CODE&nbsp;GOES&nbsp;HERE</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">print&nbsp;mean</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">print&nbsp;beta&nbsp;</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">print&nbsp;dot(mean,beta)</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">print&nbsp;dirichlet(mean*beta,size=1)</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;temp=sample(dirichlet(beta*array(mean),size=1),N)<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">print&nbsp;temp</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">return</span>&nbsp;temp</div>please keep in mind the&nbsp;<span style="font-size: 13px; background-color: #eeeeee; ">dirichlet</span>&nbsp;function is &nbsp;&#8220;from numpy.random.mtrand import dirichlet"<br />and the parameters it receives are corresponding to beta*array(mean). beta is the concentration factor, and mean is the vector which sum to 1.<br /><br /><br /><br />another way is to generate corpus is using the property:<br />P(D'|D,H)= Nv+beta_nv/N+beta<br /><div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000FF; ">def</span>&nbsp;generate_corpus_collapsed(beta,&nbsp;mean,&nbsp;N):<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #800000; ">"""</span><span style="color: #800000; "><br />&nbsp;&nbsp;&nbsp;&nbsp;Returns&nbsp;a&nbsp;corpus&nbsp;of&nbsp;tokens&nbsp;drawn&nbsp;from&nbsp;a&nbsp;Dirichlet--multinomial<br />&nbsp;&nbsp;&nbsp;&nbsp;unigram&nbsp;language&nbsp;model&nbsp;using&nbsp;the&nbsp;'collapsed'&nbsp;generative&nbsp;process<br />&nbsp;&nbsp;&nbsp;&nbsp;(i.e.,&nbsp;phi&nbsp;is&nbsp;not&nbsp;explicitly&nbsp;represented).&nbsp;Each&nbsp;token&nbsp;is&nbsp;an<br />&nbsp;&nbsp;&nbsp;&nbsp;instance&nbsp;of&nbsp;one&nbsp;of&nbsp;V&nbsp;unique&nbsp;word&nbsp;types.<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;Arguments:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;beta&nbsp;--&nbsp;concentration&nbsp;parameter&nbsp;for&nbsp;the&nbsp;Dirichlet&nbsp;prior<br />&nbsp;&nbsp;&nbsp;&nbsp;mean&nbsp;--&nbsp;V-dimensional&nbsp;mean&nbsp;of&nbsp;the&nbsp;Dirichlet&nbsp;prior<br />&nbsp;&nbsp;&nbsp;&nbsp;N&nbsp;--&nbsp;number&nbsp;of&nbsp;tokens&nbsp;to&nbsp;generate<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #800000; ">"""</span><br /><br />&nbsp;&nbsp;&nbsp;&nbsp;V&nbsp;=&nbsp;len(mean)&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">&nbsp;vocabulary&nbsp;size</span><span style="color: #008000; "><br /></span><br />&nbsp;&nbsp;&nbsp;&nbsp;corpus&nbsp;=&nbsp;zeros(N,&nbsp;dtype=int)&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">&nbsp;corpus</span><span style="color: #008000; "><br /></span><br />&nbsp;&nbsp;&nbsp;&nbsp;Nv&nbsp;=&nbsp;zeros(V,&nbsp;dtype=int)&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">&nbsp;counts&nbsp;for&nbsp;each&nbsp;word&nbsp;type</span><span style="color: #008000; "><br /></span><br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">pass</span>&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">&nbsp;YOUR&nbsp;CODE&nbsp;GOES&nbsp;HERE</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">for</span>&nbsp;n&nbsp;<span style="color: #0000FF; ">in</span>&nbsp;xrange(N):<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;corpus[n]=sample((Nv+beta*array(mean))/(n+beta),1)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Nv[corpus[n]]+=1;&nbsp;&nbsp;&nbsp;&nbsp;<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">return</span>&nbsp;corpus</div><br />Let's see how to generate corpus for <strong>Mixture of&nbsp;Dirichlet-multinomial unigram language model</strong>&nbsp;<br /><br /><div style="background-color: #eeeeee; font-size: 13px; border: 1px solid #cccccc; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all; "><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><span style="color: #0000FF; ">def</span>&nbsp;generate_corpus(alpha,&nbsp;m,&nbsp;beta,&nbsp;n,&nbsp;D,&nbsp;Nd):<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #800000; ">"""</span><span style="color: #800000; "><br />&nbsp;&nbsp;&nbsp;&nbsp;Returns&nbsp;a&nbsp;grouped&nbsp;corpus&nbsp;drawn&nbsp;from&nbsp;a&nbsp;mixture&nbsp;of<br />&nbsp;&nbsp;&nbsp;&nbsp;Dirichlet--multinomial&nbsp;unigram&nbsp;language&nbsp;models.<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;Arguments:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;alpha&nbsp;--&nbsp;concentration&nbsp;parameter&nbsp;for&nbsp;the&nbsp;Dirichlet&nbsp;prior&nbsp;over&nbsp;theta<br />&nbsp;&nbsp;&nbsp;&nbsp;m&nbsp;--&nbsp;T-dimensional&nbsp;mean&nbsp;of&nbsp;the&nbsp;Dirichlet&nbsp;prior&nbsp;over&nbsp;theta<br />&nbsp;&nbsp;&nbsp;&nbsp;beta&nbsp;--&nbsp;concentration&nbsp;parameter&nbsp;for&nbsp;the&nbsp;Dirichlet&nbsp;prior&nbsp;over&nbsp;phis<br />&nbsp;&nbsp;&nbsp;&nbsp;n&nbsp;--&nbsp;V-dimensional&nbsp;mean&nbsp;of&nbsp;the&nbsp;Dirichlet&nbsp;prior&nbsp;over&nbsp;phis<br />&nbsp;&nbsp;&nbsp;&nbsp;D&nbsp;--&nbsp;number&nbsp;of&nbsp;documents&nbsp;to&nbsp;generate<br />&nbsp;&nbsp;&nbsp;&nbsp;Nd&nbsp;--&nbsp;number&nbsp;of&nbsp;tokens&nbsp;to&nbsp;generate&nbsp;per&nbsp;document<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #800000; ">"""</span><br />&nbsp;&nbsp;&nbsp;&nbsp;corpus&nbsp;=&nbsp;GroupedCorpus()<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">pass</span>&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">&nbsp;YOUR&nbsp;CODE&nbsp;GOES&nbsp;HERE</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">determine&nbsp;the&nbsp;topic&nbsp;the&nbsp;distribution&nbsp;for&nbsp;topic&nbsp;dirichlet(dot(m,alpha),size=1)</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">given&nbsp;the&nbsp;topic,&nbsp;the&nbsp;distribtuion&nbsp;for&nbsp;word&nbsp;dirichlet(dot(n,beta),size=1)</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;theta=dirichlet(alpha*array(m),1)<br />&nbsp;&nbsp;&nbsp;&nbsp;phis=dirichlet(beta*array(n),len(m))<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">for</span>&nbsp;d&nbsp;<span style="color: #0000FF; ">in</span>&nbsp;range(0,D):<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[t]=sample(theta,1)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #008000; ">#</span><span style="color: #008000; ">print&nbsp;groupVcab</span><span style="color: #008000; "><br /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;corpus.add(str(d),str(t),[str(x)&nbsp;<span style="color: #0000FF; ">for</span>&nbsp;x&nbsp;<span style="color: #0000FF; ">in</span>&nbsp;sample(phis[t,:],Nd)])&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: #0000FF; ">return</span>&nbsp;corpus</div>注意是T个topic (group)，<span style="font-size: 13px; background-color: #eeeeee; ">&nbsp;&nbsp;phis</span><span style="font-size: 13px; background-color: #eeeeee; ">=</span><span style="font-size: 13px; background-color: #eeeeee; ">dirichlet(beta</span><span style="font-size: 13px; background-color: #eeeeee; ">*</span><span style="font-size: 13px; background-color: #eeeeee; ">array(n),len(m))</span>&nbsp; 产生了T个 dirichlet distribution,相同的topic t应该取同一个 dirichlet distribution phis[t,:]</div><img src ="http://www.cppblog.com/luyulaile/aggbug/193960.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/luyulaile/" target="_blank">luis</a> 2012-10-28 10:13 <a href="http://www.cppblog.com/luyulaile/archive/2012/10/28/193960.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Python 空数组</title><link>http://www.cppblog.com/luyulaile/archive/2012/09/19/191202.html</link><dc:creator>luis</dc:creator><author>luis</author><pubDate>Wed, 19 Sep 2012 01:47:00 GMT</pubDate><guid>http://www.cppblog.com/luyulaile/archive/2012/09/19/191202.html</guid><wfw:comment>http://www.cppblog.com/luyulaile/comments/191202.html</wfw:comment><comments>http://www.cppblog.com/luyulaile/archive/2012/09/19/191202.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/luyulaile/comments/commentRss/191202.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/luyulaile/services/trackbacks/191202.html</trackback:ping><description><![CDATA[<div>Python array 用法 &nbsp;&nbsp;<br />直接 result=[]</div><div>&nbsp; &nbsp; for x in range(0,N):</div><div>&nbsp; &nbsp; &nbsp; &nbsp; temp=beta(b,n)</div><div>&nbsp; &nbsp; &nbsp; &nbsp; print temp</div><div>&nbsp; &nbsp; &nbsp; &nbsp; if temp &gt;= n:</div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.append("Yes") &nbsp;#直接append</div><div>&nbsp; &nbsp; &nbsp; &nbsp; else:</div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.append("No") #直接append</div><div>&nbsp; &nbsp; return result</div><img src ="http://www.cppblog.com/luyulaile/aggbug/191202.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/luyulaile/" target="_blank">luis</a> 2012-09-19 09:47 <a href="http://www.cppblog.com/luyulaile/archive/2012/09/19/191202.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Python笔记</title><link>http://www.cppblog.com/luyulaile/archive/2012/09/09/190021.html</link><dc:creator>luis</dc:creator><author>luis</author><pubDate>Sun, 09 Sep 2012 06:47:00 GMT</pubDate><guid>http://www.cppblog.com/luyulaile/archive/2012/09/09/190021.html</guid><wfw:comment>http://www.cppblog.com/luyulaile/comments/190021.html</wfw:comment><comments>http://www.cppblog.com/luyulaile/archive/2012/09/09/190021.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/luyulaile/comments/commentRss/190021.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/luyulaile/services/trackbacks/190021.html</trackback:ping><description><![CDATA[<span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; ">Tutorial ：</span>&nbsp;<a href="http://www.tutorialspoint.com/python/python_files_io.htm">http://www.tutorialspoint.com/python/python_files_io.htm</a>&nbsp;<br /><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; "><br />Python IO<br />输出 print<br /></span><pre style="font-family: 'Courier New', monospace; font-size: 12px; margin-bottom: 0px; margin-top: 0px; line-height: normal; background-color: #f1f1f1; ">str = raw_input("Enter your input: "); print "Received input is : ", str</pre><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; "><br /><br />Python只有三种变量类型 int, string, float?&nbsp;<br />typeof(1.5)<br />貌似不支持隐式类型转换<br />print str(2.5) &nbsp;对<br />print 2.5 错<br />print '2.5' 对<br /><br />python 定义方法是<br />def MethodName(para,para2): #注意这里的冒号<br />&nbsp; &nbsp; &nbsp; if&nbsp;<br /><br />python注释<br />#单行注释<br />""" &nbsp;三个双引号是多行注释 """<br /><br />python 引用<br />include math<br /><br /><br />Python的str<br />str(var)类型转换<br />len(var)<br />var.upper()<br />var.lower()<br />var[2] 第三个（注意下标从0开始）元素，类似于list<br />var[:3] 前三个元素，实际上指的是0截止到3-1的元素<br />var[2:4]下标是2到4-1的所有元素<br /><br />Python的list<br />exampe=[a,b,c,d,e,f];<br />len(exampe)<br />自带sort方法<br /><br />Python的dictionary<br />key -value对应<br />value可以是一个list<br />注意方括号[]里面只能使用key<br />区分 del dict['Name']<br /><br /><br /></span><pre style="font-family: 'Courier New', monospace; font-size: 12px; margin-bottom: 0px; margin-top: 0px; line-height: normal; background-color: #f1f1f1; ">del dict['Name']; # remove entry with key 'Name' dict.clear();     # remove all entries in dict del dict ;        # delete entire dictionary</pre><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; ">自带的方法：<br /><br /></span><table border="1" width="100%" cellpadding="5" cellspacing="0" style="border-style: solid; border-color: #aaaaaa; font-family: verdana, helvetica, arial, sans-serif; font-size: 11px; width: 552px; background-color: #f1f1f1; border-collapse: collapse; padding-left: 5px; vertical-align: top; color: #000000; text-align: left; "><tbody><tr><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; ">1</td><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; "><a href="http://www.tutorialspoint.com/python/dictionary_cmp.htm" style="color: #900b09; background-color: transparent; ">cmp(dict1, dict2)</a><br />Compares elements of both dict.</td></tr><tr><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; ">2</td><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; "><a href="http://www.tutorialspoint.com/python/dictionary_len.htm" style="color: #900b09; background-color: transparent; ">len(dict)</a><br />Gives the total length of the dictionary. This would be equal to the number of items in the dictionary.</td></tr><tr><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; ">3</td><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; "><a href="http://www.tutorialspoint.com/python/dictionary_str.htm" style="color: #900b09; background-color: transparent; ">str(dict)</a><br />Produces a printable string representation of a dictionary</td></tr><tr><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; ">4</td><td style="border-style: solid; border-color: #aaaaaa; vertical-align: top; margin-bottom: 0px; border-collapse: collapse; "><a href="http://www.tutorialspoint.com/python/dictionary_type.htm" style="color: #900b09; background-color: transparent; ">type(variable)</a><br />Returns the type of the passed variable. If passed variable is dictionary then it would return a dictionary type.</td></tr></tbody></table><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; "><br /><br /></span><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; "><br /><br />Python类的定义和类方法的定义<br />定义类不需要用def class,直接<br /><br />class ClassName(object):<br />每个类都有 __init__(self,arg):<br />方法，注意是 左右各两个下划线，总共4根下划线<br />&nbsp;<br />类方法都需要包含self这个参数，但是使用的时候不需要self,见下例<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;<br />例如<br /></span><div>class Adder(object):</div><div><span style="white-space: pre; ">	</span>def __init__(self):</div><div><span style="white-space: pre; ">		</span>self.baseNum = 2</div><div><span style="white-space: pre; ">	</span>def prnt_num(self):</div><div><span style="white-space: pre; ">		</span>print self.baseNum</div><div><span style="white-space: pre; ">	</span>def add_to_base(self, arg):</div><div><span style="white-space: pre; ">		</span># Your code here</div><div><span style="white-space: pre; ">		</span>self.baseNum+=arg</div><div><span style="white-space: pre; ">		</span>print self.baseNum</div><div>objectVar = Adder()</div><div>objectVar.prnt_num()</div><div># Your code here</div><div>objectVar.add_to_base(3)</div><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; "><br />Python中的类变量不能 self.xxx来引用，但是成员变量可以<br />Class variables are special because they belong to the&nbsp;</span><em style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; ">class</em><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; ">; the objects created do not get their own copies of the class variable. Class variables are accessed using the class name and dot notation.</span><br style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; " /><code style="padding: 2px 4px; font-family: Monaco, 'Courier New', monospace; font-size: 13px; color: #dd1144; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; background-color: #f7f7f9; border: 1px solid #e1e1e8; line-height: 1.5em; "><span style="white-space: inherit; "><span style="color: #c42b23 !important; ">ClassName</span>.<span style="color: #c42b23 !important; ">classVar</span></span></code><br style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; " /><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; ">Class variables are created&nbsp;</span><em style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; ">outside</em><span style="color: #333333; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 20px; background-color: #ffffff; ">&nbsp;of</span><code style="padding: 2px 4px; font-family: Monaco, 'Courier New', monospace; font-size: 13px; color: #dd1144; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px; background-color: #f7f7f9; border: 1px solid #e1e1e8; line-height: 1.5em; "><span style="white-space: inherit; "><span style="color: #c42b23 !important; ">__init__</span></span></code>&nbsp;<br />例如：<br /><div>class Widget(object):</div><div><span style="white-space: pre; ">	</span>objID = 0</div><div><span style="white-space: pre; ">	</span>def __init__(self):</div><div><span style="white-space: pre; ">		</span>Widget.objID += 1</div><div><span style="white-space: pre; ">		</span># Your code here</div><div><span style="white-space: pre; ">		</span>self.myID=Widget.objID<br /><br /><span style="color: red; ">常犯错误：indentation is very important &nbsp;</span><br /><span style="color: red; ">python indentation error expected an indented block</span><br /><span style="color: red; ">还有一个错误就是 类方法，必须使用 self参数，即使没有参数！！</span><br /><span style="color: red; ">另外一个常错的地方就是 __init__(self,arg) 一定是四根下划线</span><br /><br /><br /><br />不仅要记得留dent 还要记得 缩进<br /><br /><br /><br /><br /><br /><br /></div><div></div><img src ="http://www.cppblog.com/luyulaile/aggbug/190021.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/luyulaile/" target="_blank">luis</a> 2012-09-09 14:47 <a href="http://www.cppblog.com/luyulaile/archive/2012/09/09/190021.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>