﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-杰-随笔分类-Optimization</title><link>http://www.cppblog.com/guijie/category/20090.html</link><description>杰哥好,哈哈!</description><language>zh-cn</language><lastBuildDate>Tue, 09 Apr 2019 13:20:33 GMT</lastBuildDate><pubDate>Tue, 09 Apr 2019 13:20:33 GMT</pubDate><ttl>60</ttl><item><title>How to solve AX + XB = C for X using matlab?</title><link>http://www.cppblog.com/guijie/archive/2015/07/06/211161.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Mon, 06 Jul 2015 07:28:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2015/07/06/211161.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/211161.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2015/07/06/211161.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/211161.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/211161.html</trackback:ping><description><![CDATA[<font face="Courier New">X = sylvester(A,B,C)</font> <br /><a href="http://cn.mathworks.com/help/matlab/ref/sylvester.html">http://cn.mathworks.com/help/matlab/ref/sylvester.html</a><img src ="http://www.cppblog.com/guijie/aggbug/211161.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2015-07-06 15:28 <a href="http://www.cppblog.com/guijie/archive/2015/07/06/211161.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Alternating optimization</title><link>http://www.cppblog.com/guijie/archive/2015/05/24/210729.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Sun, 24 May 2015 04:58:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2015/05/24/210729.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/210729.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2015/05/24/210729.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/210729.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/210729.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Composite Quantization for Approximate Nearest Neighbor Search (ICML 2014)该文第三页左侧，倒数第五行提到alternative optimization；NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization, 该文第二页，公式2上面两行，block coordinate descent，以公式2和3为例；Feature Fusion Using Locally Linear Embedding for Classification提到的参考文献Some Notes on Alternating Optimization;Two-Dimensional Linear Discriminant Analysis的第四页提到的Due to the difficulty of computing the optimal L and R simultaneously, we derive an iterative algorithm in the following.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 我个人理解，这几个概念都是等价的。<br /><br /><p style="text-align:justify;text-justify:inter-ideograph;"><span style="font-size:12.0pt; line-height:115%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;Times New Roman&quot;;">&#8216;alternating optimization&#8217; or &#8216;alternative optimization&#8217;?</span></p>  <p><span style="font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;Times New Roman&quot;;">Sue (UTS) comment: &#8216;Alternating&#8217; means you use this optimization with another optimization, one after the other. &#8216;Alternative&#8217; means you use this optimization instead of any other.</span></p>  <p style="text-align:justify;text-justify:inter-ideograph;"><span style="font-size:12.0pt;line-height:115%;font-family:楷体;Times New Roman&quot;;Times New Roman&quot;;">我的</span><span style="font-size:12.0pt;line-height:115%;font-family: &quot;Times New Roman&quot;,&quot;serif&quot;;">GSM-PAF</span><span style="font-size:12.0pt;line-height:115%; font-family:楷体;Times New Roman&quot;;Times New Roman&quot;;">最后用的</span><span style="font-size:12.0pt;line-height:115%;font-family: &quot;Times New Roman&quot;,&quot;serif&quot;;Times New Roman&quot;;">&#8216;alternating optimization&#8217;</span></p><img src ="http://www.cppblog.com/guijie/aggbug/210729.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2015-05-24 12:58 <a href="http://www.cppblog.com/guijie/archive/2015/05/24/210729.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>完全掌握 最大似然估计</title><link>http://www.cppblog.com/guijie/archive/2013/12/05/204609.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Thu, 05 Dec 2013 11:21:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2013/12/05/204609.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/204609.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2013/12/05/204609.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/204609.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/204609.html</trackback:ping><description><![CDATA[<div style="font-family: Verdana, Arial, Helvetica, sans-serif; line-height: 25px; background-color: #ffffff">
<div style="font-family: Verdana, Arial, Helvetica, sans-serif; line-height: 25px; background-color: #ffffff">这是属于概率论与数理统计中参数估计的内容，见教材第七章P168；模式识别笔记的Section 3.11.1(Section 3.11到Section 3.11.1的内容应该记住)<br />总结：最大似然函数估计法，首先是假设所得的样本服从某一分布，目标是估计出这个分布中的参数，方法是得到这一组样本的概率最大时就对应了该模型的参数值，写出似然函数，再求对数（得到对数似然），再求对数似然函数的平均（对数平均似然），再对其求导，得出参数值。目前我理解的需要求对数的原因是，通常概率是小数，连乘之后会非常小，对计算机而言，容易造成浮点数下溢，所以用了取对数。<br />Zhengxia也提到过似然(likelihood)就是概率，观测到的概率。<br /><a href="https://en.wikipedia.org/wiki/Likelihood_function">https://en.wikipedia.org/wiki/Likelihood_function</a></div></div><img src ="http://www.cppblog.com/guijie/aggbug/204609.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2013-12-05 19:21 <a href="http://www.cppblog.com/guijie/archive/2013/12/05/204609.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>How to use matlab solve optimization quadratic?</title><link>http://www.cppblog.com/guijie/archive/2012/11/21/195475.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Wed, 21 Nov 2012 10:31:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2012/11/21/195475.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/195475.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2012/11/21/195475.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/195475.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/195475.html</trackback:ping><description><![CDATA[Nannan gives me a fold named "Matlab Help". On page 46 of "Optimization Toolbox User Guide", it lists the constrain and objective type, and the matlab function. For example, if the constrain is linear and the objective is quadratic, we can use quadprog. Note that it can not slove $D_1$ in Section 4.1 of "Smooth minimization of non-smooth functions". Problem: max ((X^T)HX) and H is positive semi definite. The matlab function "quadratic" can not solve this kind of problem. It can only solve the problem: min ((X^T)HX) and H is positive&nbsp;semi definite.<img src ="http://www.cppblog.com/guijie/aggbug/195475.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2012-11-21 18:31 <a href="http://www.cppblog.com/guijie/archive/2012/11/21/195475.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Taylor series in several variables</title><link>http://www.cppblog.com/guijie/archive/2012/10/31/194113.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Wed, 31 Oct 2012 02:48:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2012/10/31/194113.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/194113.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2012/10/31/194113.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/194113.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/194113.html</trackback:ping><description><![CDATA[<a href="http://en.wikipedia.org/wiki/Taylor_series">http://en.wikipedia.org/wiki/Taylor_series<br /></a><br /><h2><div>Taylor series in several variables</div></h2><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">The Taylor series may also be generalized to functions of more than one variable with</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="T(x_1,\dots,x_d) = \sum_{n_1=0}^\infty \sum_{n_2=0}^\infty \cdots \sum_{n_d = 0}^\infty  \frac{(x_1-a_1)^{n_1}\cdots (x_d-a_d)^{n_d}}{n_1!\cdots n_d!}\,\left(\frac{\partial^{n_1 + \cdots + n_d}f}{\partial x_1^{n_1}\cdots \partial x_d^{n_d}}\right)(a_1,\dots,a_d).\!" src="http://upload.wikimedia.org/math/4/d/c/4dcdce33f7547fa071b03162da46011e.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">For example, for a function that depends on two variables,&nbsp;<em>x</em>&nbsp;and&nbsp;<em>y</em>, the Taylor series to second order about the point (<em>a</em>,&nbsp;<em>b</em>) is:</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt=" \begin{align} f(x,y) &amp; \approx f(a,b) +(x-a)\, f_x(a,b) +(y-b)\, f_y(a,b) \\ &amp; {}\quad + \frac{1}{2!}\left[ (x-a)^2\,f_{xx}(a,b) + 2(x-a)(y-b)\,f_{xy}(a,b) +(y-b)^2\, f_{yy}(a,b) \right], \end{align} " src="http://upload.wikimedia.org/math/5/5/8/5587e7367ecb9029926201c9747966b2.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">where the subscripts denote the respective&nbsp;<a href="http://en.wikipedia.org/wiki/Partial_derivative" title="Partial derivative" style="text-decoration: none; color: #0b0080; background-image: none; background-position: initial initial; background-repeat: initial initial; ">partial derivatives</a>.</p><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">A second-order Taylor series expansion of a scalar-valued function of more than one variable can be written compactly as</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="T(\mathbf{x}) = f(\mathbf{a}) + \mathrm{D} f(\mathbf{a})^T (\mathbf{x} - \mathbf{a})  + \frac{1}{2!} (\mathbf{x} - \mathbf{a})^T \,\{\mathrm{D}^2 f(\mathbf{a})\}\,(\mathbf{x} - \mathbf{a}) + \cdots\! \,," src="http://upload.wikimedia.org/math/1/b/1/1b18d64f090816b648f9b8c670fee944.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">where&nbsp;<img alt="D f(\mathbf{a})\!" src="http://upload.wikimedia.org/math/7/f/8/7f8573267a5f6c8b18351ed36f448b26.png" style="border: none; vertical-align: middle; margin: 0px; " />&nbsp;is the&nbsp;<a href="http://en.wikipedia.org/wiki/Gradient" title="Gradient" style="text-decoration: none; color: #0b0080; background-image: none; background-position: initial initial; background-repeat: initial initial; ">gradient</a>&nbsp;of&nbsp;<img alt="\,f" src="http://upload.wikimedia.org/math/6/9/4/6942cf05cb0188b1e8e3129445991760.png" style="border: none; vertical-align: middle; margin: 0px; " />&nbsp;evaluated at&nbsp;<img alt="\mathbf{x} = \mathbf{a}" src="http://upload.wikimedia.org/math/f/d/6/fd60bb92d6515047c05fcbe64203f9b5.png" style="border: none; vertical-align: middle; margin: 0px; " />&nbsp;and&nbsp;<img alt="D^2 f(\mathbf{a})\!" src="http://upload.wikimedia.org/math/9/0/f/90f708ee09a65096888e2680b23d85a5.png" style="border: none; vertical-align: middle; margin: 0px; " />&nbsp;is the&nbsp;<a href="http://en.wikipedia.org/wiki/Hessian_matrix" title="Hessian matrix" style="text-decoration: none; color: #0b0080; background-image: none; background-position: initial initial; background-repeat: initial initial; ">Hessian matrix</a>. Applying the&nbsp;<a href="http://en.wikipedia.org/wiki/Multi-index_notation" title="Multi-index notation" style="text-decoration: none; color: #0b0080; background-image: none; background-position: initial initial; background-repeat: initial initial; ">multi-index notation</a>&nbsp;the Taylor series for several variables becomes</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="T(\mathbf{x}) = \sum_{|\alpha| \ge 0}^{}\frac{(\mathbf{x}-\mathbf{a})^{\alpha}}{\alpha&nbsp;!}\,({\mathrm{\partial}^{\alpha}}\,f)(\mathbf{a})\,," src="http://upload.wikimedia.org/math/3/1/3/3132ef7dcdff08ec111f709c341b62c4.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">which is to be understood as a still more abbreviated&nbsp;<a href="http://en.wikipedia.org/wiki/Multi-index" title="Multi-index" style="text-decoration: none; color: #0b0080; background-image: none; background-position: initial initial; background-repeat: initial initial; ">multi-index</a>&nbsp;version of the first equation of this paragraph, again in full analogy to the single variable case.</p><h3><span style="-webkit-user-select: none; float: right; font-size: 13px; font-weight: normal; margin-left: 5px; ">[<a href="http://en.wikipedia.org/w/index.php?title=Taylor_series&amp;action=edit&amp;section=14" title="Edit section: Example" style="text-decoration: none; color: #0b0080; background-image: none; background-position: initial initial; background-repeat: initial initial; ">edit</a>]</span><span id="Example">Example</span></h3><div tright"="" style="clear: right; float: right; margin: 0.5em 0px 1.3em 1.4em; width: auto; background-color: #ffffff; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; "><div style="min-width: 100px; border: 1px solid #cccccc; background-color: #f9f9f9; font-size: 12px; text-align: center; overflow: hidden; padding: 3px !important; width: 202px; "><a href="http://en.wikipedia.org/wiki/File:Taylor_e%5Exln1plusy.png" style="text-decoration: none; color: #0b0080; background-image: none; background-position: initial initial; background-repeat: initial initial; "><img alt="" src="http://upload.wikimedia.org/wikipedia/en/thumb/1/10/Taylor_e%5Exln1plusy.png/200px-Taylor_e%5Exln1plusy.png" width="200" height="212" srcset="//upload.wikimedia.org/wikipedia/en/thumb/1/10/Taylor_e%5Exln1plusy.png/300px-Taylor_e%5Exln1plusy.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/1/10/Taylor_e%5Exln1plusy.png/400px-Taylor_e%5Exln1plusy.png 2x" style="border: 1px solid #cccccc; vertical-align: middle; background-color: #ffffff; " /></a><div style="border: none; line-height: 1.4em; font-size: 11px; padding: 3px !important; text-align: left; "><div style="border: none !important; background-image: none !important; float: right; background-position: initial initial !important; background-repeat: initial initial !important; "><a href="http://en.wikipedia.org/wiki/File:Taylor_e%5Exln1plusy.png" title="Enlarge" style="text-decoration: none; color: #0b0080; background-image: none !important; display: block; border: none !important; background-position: initial initial !important; background-repeat: initial initial !important; "><img src="http://bits.wikimedia.org/static-1.21wmf2/skins/common/images/magnify-clip.png" width="15" height="11" alt="" style="border: none !important; vertical-align: middle; display: block; background-image: none !important; background-position: initial initial !important; background-repeat: initial initial !important; " /></a></div>Second-order Taylor series approximation (in gray) of a function&nbsp;<img alt="f(x,y) = e^x\log{(1+y)}" src="http://upload.wikimedia.org/math/b/3/a/b3a9fe7e653a16d66eba3dce3d6ef061.png" style="border: none; vertical-align: middle; " />around origin.</div></div></div><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">Compute a second-order Taylor series expansion around point&nbsp;<img alt="(a,b) = (0,0)" src="http://upload.wikimedia.org/math/0/2/6/0261739a8dee10f946d3f50c4ece0955.png" style="border: none; vertical-align: middle; margin: 0px; " />&nbsp;of a function</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="f(x,y)=e^x\log(1+y).\," src="http://upload.wikimedia.org/math/5/a/b/5ab1d5bc1938ec988d5afb6319a4b036.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">Firstly, we compute all partial derivatives we need</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="f_x(a,b)=e^x\log(1+y)\bigg|_{(x,y)=(0,0)}=0\,," src="http://upload.wikimedia.org/math/9/9/4/994f641bfba0ce56f3aabcf5d6f3cc94.png" style="border: none; vertical-align: middle; " /></dd></dl><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="f_y(a,b)=\frac{e^x}{1+y}\bigg|_{(x,y)=(0,0)}=1\,," src="http://upload.wikimedia.org/math/f/d/3/fd33fe83b97b66ae057fcfd468d41fc6.png" style="border: none; vertical-align: middle; " /></dd></dl><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="f_{xx}(a,b)=e^x\log(1+y)\bigg|_{(x,y)=(0,0)}=0\,," src="http://upload.wikimedia.org/math/2/c/c/2cccbc3571085347cfea76d9c2943cb6.png" style="border: none; vertical-align: middle; " /></dd></dl><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="f_{yy}(a,b)=-\frac{e^x}{(1+y)^2}\bigg|_{(x,y)=(0,0)}=-1\,," src="http://upload.wikimedia.org/math/0/d/6/0d6b81524defff97f2a73020a308fd9a.png" style="border: none; vertical-align: middle; " /></dd></dl><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="f_{xy}(a,b)=f_{yx}(a,b)=\frac{e^x}{1+y}\bigg|_{(x,y)=(0,0)}=1." src="http://upload.wikimedia.org/math/2/3/0/2301155d1d8495ae4efa330167370975.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">The Taylor series is</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="\begin{align} T(x,y) = f(a,b) &amp; +(x-a)\, f_x(a,b) +(y-b)\, f_y(a,b) \\ &amp;+\frac{1}{2!}\left[ (x-a)^2\,f_{xx}(a,b) + 2(x-a)(y-b)\,f_{xy}(a,b) +(y-b)^2\, f_{yy}(a,b) \right]+ \cdots\,,\end{align}" src="http://upload.wikimedia.org/math/d/a/d/dad3055d4695f7e70e10c5e403a92112.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">which in this case becomes</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="\begin{align}T(x,y) &amp;= 0 + 0(x-0) + 1(y-0) + \frac{1}{2}\Big[ 0(x-0)^2 + 2(x-0)(y-0) + (-1)(y-0)^2 \Big] + \cdots \\ &amp;= y + xy - \frac{y^2}{2} + \cdots. \end{align} " src="http://upload.wikimedia.org/math/f/e/2/fe2770b7f29c8d32af5ca24d26b9cd99.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">Since&nbsp;<span style="white-space: nowrap; ">log(1 +&nbsp;<em>y</em>)</span>&nbsp;is analytic in |<em>y</em>|&nbsp;&lt;&nbsp;1, we have</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="e^x\log(1+y)= y + xy - \frac{y^2}{2} + \cdots" src="http://upload.wikimedia.org/math/6/c/b/6cb330326a4a453381d633888ac49343.png" style="border: none; vertical-align: middle; " /></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; background-color: #ffffff; ">for |<em>y</em>|&nbsp;&lt;&nbsp;1.</p><img src ="http://www.cppblog.com/guijie/aggbug/194113.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2012-10-31 10:48 <a href="http://www.cppblog.com/guijie/archive/2012/10/31/194113.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Jensen's inequality</title><link>http://www.cppblog.com/guijie/archive/2012/10/30/194080.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Tue, 30 Oct 2012 04:04:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2012/10/30/194080.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/194080.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2012/10/30/194080.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/194080.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/194080.html</trackback:ping><description><![CDATA[<h1><a href="http://en.wikipedia.org/wiki/Jensen's_inequality">http://en.wikipedia.org/wiki/Jensen's_inequality<br /><br /></a><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; font-weight: normal; background-color: #ffffff; ">If&nbsp;<em>&#955;</em><sub style="line-height: 1em; ">1</sub>&nbsp;and&nbsp;<em>&#955;</em><sub style="line-height: 1em; ">2</sub>&nbsp;are two arbitrary nonnegative real numbers such that&nbsp;<em>&#955;</em><sub style="line-height: 1em; ">1</sub>&nbsp;+&nbsp;<em>&#955;</em><sub style="line-height: 1em; ">2</sub>&nbsp;=&nbsp;1 then <span style="color: red; ">convexity of&nbsp;</span><img alt="\scriptstyle\varphi" src="http://upload.wikimedia.org/math/4/5/5/455136e0a43e7634fcc7d2904c0612d9.png" style="border: none; vertical-align: middle; margin: 0px; " />&nbsp;implies</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; font-weight: normal; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="\varphi(\lambda_1 x_1+\lambda_2 x_2)\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2)\text{ for any }x_1,\,x_2." src="http://upload.wikimedia.org/math/e/b/2/eb21b34dec237f54ffd9a2a6033e9960.png" style="border: none; vertical-align: middle; " />&nbsp; [<span style="color: red;">这就是凸函数的定义]</span></dd></dl><p style="margin: 0.4em 0px 0.5em; line-height: 19.200000762939453px; font-family: sans-serif; font-size: 13px; font-weight: normal; background-color: #ffffff; ">This can be easily generalized: if&nbsp;<em>&#955;</em><sub style="line-height: 1em; ">1</sub>,&nbsp;<em>&#955;</em><sub style="line-height: 1em; ">2</sub>, ...,&nbsp;<em>&#955;</em><sub style="line-height: 1em; "><em>n</em></sub>&nbsp;are nonnegative real numbers such that&nbsp;<em>&#955;</em><sub style="line-height: 1em; ">1</sub>&nbsp;+&nbsp;...&nbsp;+&nbsp;<em>&#955;</em><sub style="line-height: 1em; "><em>n</em></sub>&nbsp;=&nbsp;1, then</p><dl style="margin-top: 0.2em; margin-bottom: 0.5em; font-family: sans-serif; font-size: 13px; font-weight: normal; line-height: 19.200000762939453px; background-color: #ffffff; "><dd style="line-height: 1.5em; margin-left: 1.6em; margin-bottom: 0.1em; margin-right: 0px; "><img alt="\varphi(\lambda_1 x_1+\lambda_2 x_2+\cdots+\lambda_n x_n)\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2)+\cdots+\lambda_n\,\varphi(x_n)," src="http://upload.wikimedia.org/math/7/d/c/7dc94248492f5fee40de9728b5d5f0e7.png" style="border: none; vertical-align: middle; " /><br /><br /><span style="color: red; ">例如-log(x)是凸函数</span></dd></dl></h1><img src ="http://www.cppblog.com/guijie/aggbug/194080.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2012-10-30 12:04 <a href="http://www.cppblog.com/guijie/archive/2012/10/30/194080.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Gradient Descent(梯度下降法)(两例对应两牛文均用该法求解目标函数)</title><link>http://www.cppblog.com/guijie/archive/2012/10/19/193522.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Fri, 19 Oct 2012 05:33:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2012/10/19/193522.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/193522.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2012/10/19/193522.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/193522.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/193522.html</trackback:ping><description><![CDATA[<div><a href="http://en.wikipedia.org/wiki/Gradient_descent">http://en.wikipedia.org/wiki/Gradient_descent</a>&nbsp;<br /><a href="http://zh.wikipedia.org/wiki/%E6%9C%80%E9%80%9F%E4%B8%8B%E9%99%8D%E6%B3%95">http://zh.wikipedia.org/wiki/%E6%9C%80%E9%80%9F%E4%B8%8B%E9%99%8D%E6%B3%95<br /></a>&nbsp;<span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">Gradient descent is based on the observation that if the multivariable function&nbsp;</span><img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="F(\mathbf{x})" src="http://upload.wikimedia.org/math/3/a/0/3a0ca721afe3aad135f6519f182aff29.png" /><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">&nbsp;is&nbsp;</span><a style="background-image: none; line-height: 19px; background-color: #ffffff; font-family: sans-serif; color: #0b0080; font-size: 13px; text-decoration: none" title="Defined and undefined" href="http://en.wikipedia.org/wiki/Defined_and_undefined">defined</a><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">&nbsp;and&nbsp;</span><a style="background-image: none; line-height: 19px; background-color: #ffffff; font-family: sans-serif; color: #0b0080; font-size: 13px; text-decoration: none" title="Differentiable function" href="http://en.wikipedia.org/wiki/Differentiable_function">differentiable</a><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">&nbsp;in a neighborhood of a point&nbsp;</span><img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="\mathbf{a}" src="http://upload.wikimedia.org/math/3/c/4/3c47f830945ee6b24984ab0ba188e10e.png" /><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">, then&nbsp;</span><img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="F(\mathbf{x})" src="http://upload.wikimedia.org/math/3/a/0/3a0ca721afe3aad135f6519f182aff29.png" /><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">&nbsp;decreases&nbsp;</span><em style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">fastest</em><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">&nbsp;if one goes from&nbsp;</span><img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="\mathbf{a}" src="http://upload.wikimedia.org/math/3/c/4/3c47f830945ee6b24984ab0ba188e10e.png" /><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">&nbsp;in the direction of the negative gradient of&nbsp;</span><img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="F" src="http://upload.wikimedia.org/math/8/0/0/800618943025315f869e4e1f09471012.png" /><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">&nbsp;at&nbsp;</span><img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="\mathbf{a}" src="http://upload.wikimedia.org/math/3/c/4/3c47f830945ee6b24984ab0ba188e10e.png" /><span style="line-height: 19px; background-color: #ffffff; font-family: sans-serif; font-size: 13px">,&nbsp;</span><img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="-\nabla F(\mathbf{a})" src="http://upload.wikimedia.org/math/4/a/2/4a23b0d19f87c671df7c9ec09ece1bb3.png" />&nbsp;<br />为啥步长要变化？Tianyi的解释很好：如果步长过大，可能使得函数值上升，故要减小步长 (下面这个图片是在纸上画好，然后scan的)。<br />Andrew NG的coursera课程Machine learning的<span style="text-align: justify; text-transform: none; background-color: rgb(255,255,255); text-indent: 0px; letter-spacing: normal; display: inline !important; font: 13px/18px Verdana, Helvetica, Arial; white-space: normal; float: none; color: rgb(94,94,94); word-spacing: 0px; -webkit-text-stroke-width: 0px">II. Linear Regression with One Variable</span>的<span style="font-family: 'Calibri','sans-serif'; font-size: 10.5pt; mso-bidi-font-size: 11.0pt; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA" lang="EN-US">Gradient descent Intuition</span><span style="font-family: 宋体; font-size: 10.5pt; mso-bidi-font-size: 11.0pt; mso-ascii-theme-font: minor-latin; mso-fareast-theme-font: minor-fareast; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA; mso-ascii-font-family: Calibri; mso-hansi-font-family: Calibri">中的解释很好，比如在下图在右侧的点，则梯度是正数，<font size="2" face="Arial">&nbsp;<img style="border-bottom: medium none; border-left: medium none; line-height: 19px; background-color: #ffffff; margin: 0px; font-family: sans-serif; font-size: 13px; vertical-align: middle; border-top: medium none; border-right: medium none" alt="-\nabla F(\mathbf{a})" src="http://upload.wikimedia.org/math/4/a/2/4a23b0d19f87c671df7c9ec09ece1bb3.png" />是负数，即使当前的a减小</font></span><br />
<div><img alt="" src="http://www.cppblog.com/images/cppblog_com/guijie/4823_001.jpg" width="528" height="1001" /></div><span style="color: red">例1</span>：Toward the Optimization of Normalized&nbsp;Graph Laplacian(TNN 2011)的Fig. 1. Normalized graph Laplacian learning algorithm是很好的梯度下降法的例子.只要看Fig1，其他不必看。Fig1陶Shuning老师课件 非线性优化第六页第四个ppt，对应教材P124，关键直线搜索策略，应用&nbsp;非线性优化第四页第四个ppt，步长加倍或减倍。只要目标减少就到下一个搜索点，并且步长加倍；否则停留在原点，将步长减倍。<br /><span style="color: red">例2</span>：&nbsp;Distance Metric Learning for Large Margin Nearest Neighbor Classification(JLMR),目标函数就是公式14，是矩阵M的二次型，展开后就会发现，关于M是线性的，故是凸的。对M求导的结果，附录公式18和19之间的公式中没有M<br /><br /><span style="color: red">我自己额外的思考：如果是凸函数，对自变量求偏导为0，然后将自变量求出来不就行了嘛，为啥还要梯度下降？上述例二是不行的，因为对M求导后与M无关了。和tianyi讨论，正因为求导为0 没有解析解采用梯度下降，有解析解就结束了<br /><br /></span>
<div><span style="color: red">http://blog.csdn.net/yudingjun0611/article/details/8147046</span></div>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun; font-size: 18px"><strong>1. 梯度下降法</strong></span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun; font-size: 12px"><span style="white-space: pre"></span>梯度下降法的原理可以参考：<a style="color: #6a3906; text-decoration: " href="http://blog.csdn.net/abcjennifer/article/details/7691571">斯坦福机器学习第一讲</a>。</span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun; font-size: 12px"><span style="white-space: pre"></span>我实验所用的数据是100个二维点。</span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun"><span style="font-size: 12px">如果梯度下降算法不能正常运行，考虑使用更小的步长(也就是学习率)，这里需要注意两点：</span></span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun"><span style="font-size: 12px">1）对于足够小的, &nbsp;能保证在每一步都减小；</span></span><span style="font-family: SimSun"><span style="font-size: 12px"><br /></span></span><span style="font-family: SimSun"><span style="font-size: 12px">2）但是如果太小，梯度下降算法收敛的会很慢；</span></span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun"><span style="font-size: 12px">总结：</span></span><span style="font-family: SimSun"><span style="font-size: 12px"><br /></span></span><span style="font-family: SimSun"><span style="font-size: 12px">1）如果太小，就会收敛很慢；</span></span><span style="font-family: SimSun"><span style="font-size: 12px"><br /></span></span><span style="font-family: SimSun"><span style="font-size: 12px">2）如果太大，就不能保证每一次迭代都减小，也就不能保证收敛；</span></span><span style="font-family: SimSun"><span style="font-size: 12px"><br /></span></span><span style="font-family: SimSun"><span style="font-size: 12px">如何选择-经验的方法：</span></span><span style="font-family: SimSun"><span style="font-size: 12px"><br /></span></span><span style="font-family: SimSun"><span style="font-size: 12px">..., 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1...</span></span><span style="font-family: SimSun"><span style="font-size: 12px"><br /></span></span><span style="font-family: SimSun"><span style="font-size: 12px">约3倍于前一个数。</span></span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun">matlab源码：</span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun"></span></p>
<div style="line-height: 26px; width: 693px; color: #362e2b" class="dp-highlighter bg_cpp">
<div class="bar">
<div class="tools"><strong>[cpp]</strong>&nbsp;<a style="background-image: url(http://static.blog.csdn.net/scripts/SyntaxHighlighter/styles/images/default/ico_plain.gif); padding-bottom: 1px; text-indent: -2000px; padding-left: 1px; width: 16px; padding-right: 1px; display: inline-block; background-position: 0% 0%; height: 16px; padding-top: 1px" class="ViewSource" title="view plain" href="http://blog.csdn.net/yudingjun0611/article/details/8147046#">view plain</a><a style="background-image: url(http://static.blog.csdn.net/scripts/SyntaxHighlighter/styles/images/default/ico_copy.gif); padding-bottom: 1px; text-indent: -2000px; padding-left: 1px; width: 16px; padding-right: 1px; display: inline-block; background-position: 0% 0%; height: 16px; padding-top: 1px" class="CopyToClipboard" title="copy" href="http://blog.csdn.net/yudingjun0611/article/details/8147046#">copy</a> 
<div style="z-index: 99; position: absolute; width: 18px; height: 18px; top: 917px; left: 664px"><embed id="ZeroClipboardMovie_1" height="18" name="ZeroClipboardMovie_1" type="application/x-shockwave-flash" align="center" pluginspage="http://www.macromedia.com/go/getflashplayer" width="18" src="http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf" wmode="transparent" flashvars="id=1&amp;width=18&amp;height=18" allowfullscreen="false" allowscriptaccess="always" bgcolor="#ffffff" quality="best" menu="false" loop="false"></div></div></div>
<ol class="dp-cpp"><li style="line-height: 18px" class="alt">function&nbsp;[theta0,theta1]=Gradient_descent(X,Y);&nbsp;&nbsp;</li><li style="line-height: 18px">theta0=0;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">theta1=0;&nbsp;&nbsp;</li><li style="line-height: 18px">t0=0;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">t1=0;&nbsp;&nbsp;</li><li style="line-height: 18px"><span class="keyword">while</span>(1)&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">for</span>&nbsp;i=1:1:100&nbsp;%100个点&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;t0=t0+(theta0+theta1*X(i,1)-Y(i,1))*1;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;t1=t1+(theta0+theta1*X(i,1)-Y(i,1))*X(i,1);&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;old_theta0=theta0;&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;old_theta1=theta1;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;theta0=theta0-0.000001*t0&nbsp;%0.000001表示学习率&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;theta1=theta1-0.000001*t1&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;t0=0;&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;t1=0;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">if</span>(sqrt((old_theta0-theta0)^2+(old_theta1-theta1)^2)&lt;0.000001)&nbsp;%&nbsp;这里是判断收敛的条件，当然可以有其他方法来做&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">break</span>;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;&nbsp;</li><li style="line-height: 18px">end&nbsp;&nbsp;</li></ol></div><br style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b" /><br style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b" />
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun; font-size: 18px"><strong>2. 随机梯度下降法</strong></span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun"><span style="white-space: pre"></span>随机梯度下降法适用于样本点数量非常庞大的情况，算法使得总体向着梯度下降快的方向下降。</span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun">matlab源码：</span></p>
<p style="line-height: 26px; background-color: #ffffff; font-family: Arial; color: #362e2b"><span style="font-family: SimSun"></span></p>
<div style="line-height: 26px; width: 693px; color: #362e2b" class="dp-highlighter bg_cpp">
<div class="bar">
<div class="tools"><strong>[cpp]</strong>&nbsp;<a style="background-image: url(http://static.blog.csdn.net/scripts/SyntaxHighlighter/styles/images/default/ico_plain.gif); padding-bottom: 1px; text-indent: -2000px; padding-left: 1px; width: 16px; padding-right: 1px; display: inline-block; background-position: 0% 0%; height: 16px; padding-top: 1px" class="ViewSource" title="view plain" href="http://blog.csdn.net/yudingjun0611/article/details/8147046#">view plain</a><a style="background-image: url(http://static.blog.csdn.net/scripts/SyntaxHighlighter/styles/images/default/ico_copy.gif); padding-bottom: 1px; text-indent: -2000px; padding-left: 1px; width: 16px; padding-right: 1px; display: inline-block; background-position: 0% 0%; height: 16px; padding-top: 1px" class="CopyToClipboard" title="copy" href="http://blog.csdn.net/yudingjun0611/article/details/8147046#">copy</a> 
<div style="z-index: 99; position: absolute; width: 18px; height: 18px; top: 1536px; left: 664px"><embed id="ZeroClipboardMovie_2" height="18" name="ZeroClipboardMovie_2" type="application/x-shockwave-flash" align="center" pluginspage="http://www.macromedia.com/go/getflashplayer" width="18" src="http://static.blog.csdn.net/scripts/ZeroClipboard/ZeroClipboard.swf" wmode="transparent" flashvars="id=2&amp;width=18&amp;height=18" allowfullscreen="false" allowscriptaccess="always" bgcolor="#ffffff" quality="best" menu="false" loop="false"></div></div></div>
<ol class="dp-cpp"><li style="line-height: 18px" class="alt">function&nbsp;[theta0,theta1]=Gradient_descent_rand(X,Y);&nbsp;&nbsp;</li><li style="line-height: 18px">theta0=0;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">theta1=0;&nbsp;&nbsp;</li><li style="line-height: 18px">t0=theta0;&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">t1=theta1;&nbsp;&nbsp;</li><li style="line-height: 18px"><span class="keyword">for</span>&nbsp;i=1:1:100&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;t0=theta0-0.01*(theta0+theta1*X(i,1)-Y(i,1))*1&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;t1=theta1-0.01*(theta0+theta1*X(i,1)-Y(i,1))*X(i,1)&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">&nbsp;&nbsp;&nbsp;&nbsp;theta0=t0&nbsp;&nbsp;</li><li style="line-height: 18px">&nbsp;&nbsp;&nbsp;&nbsp;theta1=t1&nbsp;&nbsp;</li><li style="line-height: 18px" class="alt">end &nbsp;</li></ol></div><span style="color: red"><br /></span></div><img src ="http://www.cppblog.com/guijie/aggbug/193522.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2012-10-19 13:33 <a href="http://www.cppblog.com/guijie/archive/2012/10/19/193522.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>[zz]Newton Raphson算法</title><link>http://www.cppblog.com/guijie/archive/2012/10/16/193347.html</link><dc:creator>杰哥</dc:creator><author>杰哥</author><pubDate>Mon, 15 Oct 2012 23:21:00 GMT</pubDate><guid>http://www.cppblog.com/guijie/archive/2012/10/16/193347.html</guid><wfw:comment>http://www.cppblog.com/guijie/comments/193347.html</wfw:comment><comments>http://www.cppblog.com/guijie/archive/2012/10/16/193347.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/guijie/comments/commentRss/193347.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/guijie/services/trackbacks/193347.html</trackback:ping><description><![CDATA[<a href="http://blog.csdn.net/flyingworm_eley/article/details/6517853">http://blog.csdn.net/flyingworm_eley/article/details/6517853</a>&nbsp;<br /><br /><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">Newton-Raphson算法在统计中广泛应用于求解MLE的参数估计。</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">对应的单变量如下图：</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; "><img alt="" src="http://pic002.cnblogs.com/images/2010/183335/2010111818172813.jpg" style="border: none; " /></p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">多元函数算法：</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; "><img alt="" src="http://pic002.cnblogs.com/images/2010/183335/2010111818180517.jpg" style="border: none; " /></p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">Example：（implemented in R）</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">#定义函数f(x)</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">f=function(x){<br />&nbsp; &nbsp; 1/x+1/(1-x)<br />}</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">#定义f_d1为一阶导函数</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">f_d1=function(x){<br />&nbsp; &nbsp; -1/x^2+1/(x-1)^2<br />}</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">#定义f_d2为二阶导函数</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">f_d2=function(x){<br />&nbsp; &nbsp; 2/x^3-2/(x-1)^3<br />}</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">#NR算法　<br />NR=function(time,init){<br />&nbsp; &nbsp; X=NULL<br />&nbsp; &nbsp; D1=NULL &nbsp; #储存Xi一阶导函数值<br />D2=NULL &nbsp; #储存Xi二阶导函数值<br />&nbsp; &nbsp; count=0</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp; &nbsp; X[1]=init<br />&nbsp; &nbsp; l=seq(0.02,0.98,0.0002)<br />&nbsp; &nbsp; plot(l,f(l),pch='.')<br />&nbsp; &nbsp; points(X[1],f(X[1]),pch=2,col=1)</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp; &nbsp; for (i in 2:time){<br />&nbsp; &nbsp; &nbsp; &nbsp; D1[i-1]=f_d1(X[i-1])<br />&nbsp; &nbsp; &nbsp; &nbsp; D2[i-1]=f_d2(X[i-1])<br />&nbsp; &nbsp; &nbsp; &nbsp; X[i]=X[i-1]-1/(D2[i-1])*(D1[i-1]) &nbsp; #NR算法迭代式<br />&nbsp; &nbsp; &nbsp; &nbsp; if (abs(D1[i-1])&lt;0.05)break&nbsp;<br />&nbsp; &nbsp; &nbsp; &nbsp; points(X[i],f(X[i]),pch=2,col=i)<br />&nbsp; &nbsp; &nbsp; &nbsp; count=count+1<br />&nbsp; &nbsp; }<br />&nbsp; &nbsp; return(list(x=X,Deriviative_1=D,deriviative2=D2,count))<br />}</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; "><br />o=NR(30,0.9)</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">结果如下图：图中不同颜色的三角形表示i次迭代产生的估计值Xi</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; "><img alt="" src="http://pic002.cnblogs.com/images/2010/183335/2010111818094318.jpg" style="border: none; " /></p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">o=NR(30,0.9)</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">#另取函数f(x)</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">f=function(x){<br />&nbsp; &nbsp; return(exp(3.5*cos(x))+4*sin(x))<br />}</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">f_d1=function(x){<br />&nbsp; &nbsp; return(-3.5*exp(3.5*cos(x))*sin(x)+4*cos(x))<br />}</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">f_d2=function(x){<br />&nbsp; &nbsp; return(-4*sin(x)+3.5^2*exp(3.5*cos(x))*(sin(x))^2-3.5*exp(3.5*cos(x))*cos(x))<br />}</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">&nbsp;</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">得到结果如下：</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; "><img alt="" src="http://pic002.cnblogs.com/images/2010/183335/2010111818093092.jpg" style="border: none; " /></p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">Reference from:</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">Kevin Quinn</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">Assistant Professor</p><p style="color: #333333; font-family: Arial; line-height: 26px; text-align: left; background-color: #ffffff; ">Univ Washington</p><img src ="http://www.cppblog.com/guijie/aggbug/193347.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/guijie/" target="_blank">杰哥</a> 2012-10-16 07:21 <a href="http://www.cppblog.com/guijie/archive/2012/10/16/193347.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>