﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-mysileng-随笔分类-数据挖掘</title><link>http://www.cppblog.com/mysileng/category/20515.html</link><description /><language>zh-cn</language><lastBuildDate>Thu, 09 May 2013 08:51:39 GMT</lastBuildDate><pubDate>Thu, 09 May 2013 08:51:39 GMT</pubDate><ttl>60</ttl><item><title>最小二乘法 least square method</title><link>http://www.cppblog.com/mysileng/archive/2013/05/09/200131.html</link><dc:creator>鑫龙</dc:creator><author>鑫龙</author><pubDate>Thu, 09 May 2013 08:24:00 GMT</pubDate><guid>http://www.cppblog.com/mysileng/archive/2013/05/09/200131.html</guid><wfw:comment>http://www.cppblog.com/mysileng/comments/200131.html</wfw:comment><comments>http://www.cppblog.com/mysileng/archive/2013/05/09/200131.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/mysileng/comments/commentRss/200131.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/mysileng/services/trackbacks/200131.html</trackback:ping><description><![CDATA[<p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; "><strong style="margin: 0px; padding: 0px; ">最小二乘法</strong>（又称<strong style="margin: 0px; padding: 0px; ">最小平方法</strong>）是一种数学优化技术。它通过<strong style="margin: 0px; padding: 0px; ">最小化误差的平方和</strong>寻找数据的最佳函数匹配。利用最小二乘法可以简便地求得未知的数据，并使得这些求得的数据与实际数据之间误差的平方和为最小。最小二乘法还可用于曲线拟合。其他一些优化问题也可通过最小化能量或最大化熵用最小二乘法来表达。</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; "><strong style="margin: 0px; padding: 0px; ">最小二乘法原理</strong></p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">在我们研究两个变量(x, y)之间的相互关系时，通常可以得到一系列成对的数据( x<sub style="margin: 0px; padding: 0px; ">1</sub>, y<sub style="margin: 0px; padding: 0px; ">1</sub>.&nbsp; x<sub style="margin: 0px; padding: 0px; ">2</sub>, y<sub style="margin: 0px; padding: 0px; ">2</sub>.&nbsp; &#8230;&nbsp;&nbsp;&nbsp; x<sub style="margin: 0px; padding: 0px; ">m</sub>&nbsp;, y<sub style="margin: 0px; padding: 0px; ">m</sub>&nbsp;)；将这些数据描绘在x -y直角坐标系中，若发现这些点在一条直线附近，可以令这条直线方程如(式1-1)。</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">Y<sub style="margin: 0px; padding: 0px; ">计</sub>= a<sub style="margin: 0px; padding: 0px; ">0</sub>&nbsp;+ a<sub style="margin: 0px; padding: 0px; ">1&nbsp;</sub>X&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-1</span>)</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">其中：a<sub style="margin: 0px; padding: 0px; ">0</sub>、a<sub style="margin: 0px; padding: 0px; ">1</sub>&nbsp;是任意实数</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">为建立这直线方程就要确定a<sub style="margin: 0px; padding: 0px; ">0</sub>和a<sub style="margin: 0px; padding: 0px; ">1</sub>，应用 最小二乘法原理 ，将实测值Y<sub style="margin: 0px; padding: 0px; ">i</sub>与利用(式1-1)计算值(Y<sub style="margin: 0px; padding: 0px; ">计</sub>=a<sub style="margin: 0px; padding: 0px; ">0</sub>+a<sub style="margin: 0px; padding: 0px; ">1</sub>X)的离差(Y<sub style="margin: 0px; padding: 0px; ">i</sub>-Y<sub style="margin: 0px; padding: 0px; ">计</sub>)的平方和〔&#8721;(Y<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;- Y<sub style="margin: 0px; padding: 0px; ">计</sub>)<sup style="margin: 0px; padding: 0px; ">2</sup>〕最小为&#8220;优化判据&#8221;。</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">令： &#966; = &#8721;(Y<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;- Y<sub style="margin: 0px; padding: 0px; ">计</sub>)<sup style="margin: 0px; padding: 0px; ">2</sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-2</span>)</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">把(式1-1)代入(式1-2)中得:</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">&#966; = &#8721;(Y<sub style="margin: 0px; padding: 0px; ">i&nbsp;</sub>- a<sub style="margin: 0px; padding: 0px; ">0</sub>&nbsp;- a<sub style="margin: 0px; padding: 0px; ">1</sub>&nbsp;X<sub style="margin: 0px; padding: 0px; ">i</sub>)<sup style="margin: 0px; padding: 0px; ">2</sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-3</span>)</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; background-color: #ffffff; ">当&#8721;(Y<sub style="margin: 0px; padding: 0px; ">i</sub>-Y<sub style="margin: 0px; padding: 0px; ">计</sub>)平方最小时，可用函数 &#966; 对a<sub style="margin: 0px; padding: 0px; ">0</sub>、a<sub style="margin: 0px; padding: 0px; ">1</sub>求偏导数，令这两个偏导数等于零。<br /><img src="http://www.cppblog.com/images/cppblog_com/mysileng/QQ截图20130509162356.jpg" width="397" height="143" alt="" /><br /><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">亦即：</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">m a<sub style="margin: 0px; padding: 0px; ">0</sub>&nbsp;+ (&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;) a<sub style="margin: 0px; padding: 0px; ">1</sub>&nbsp;= &#8721;Y<sub style="margin: 0px; padding: 0px; ">i&nbsp;</sub>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; (<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-6</span>)</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">(&#8721;X<sub style="margin: 0px; padding: 0px; ">i&nbsp;</sub>) a<sub style="margin: 0px; padding: 0px; ">0</sub>&nbsp;+ (&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub><sup style="margin: 0px; padding: 0px; ">2</sup>&nbsp;) a<sub style="margin: 0px; padding: 0px; ">1</sub>&nbsp;= &#8721;(X<sub style="margin: 0px; padding: 0px; ">i</sub>, Y<sub style="margin: 0px; padding: 0px; ">i</sub>)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-7</span>)</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">得到的两个关于a<sub style="margin: 0px; padding: 0px; ">0</sub>、 a<sub style="margin: 0px; padding: 0px; ">1</sub>为未知数的两个方程组，解这两个方程组得出：</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">a<sub style="margin: 0px; padding: 0px; ">0&nbsp;</sub>= (&#8721;Y<sub style="margin: 0px; padding: 0px; ">i</sub>) / m - a<sub style="margin: 0px; padding: 0px; ">1</sub>(&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub>) / m&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;(<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-8</span>)</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">a<sub style="margin: 0px; padding: 0px; ">1</sub>&nbsp;= [m&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;Y<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;- (&#8721;X<sub style="margin: 0px; padding: 0px; ">i&nbsp;</sub>&#8721;Y<sub style="margin: 0px; padding: 0px; ">i</sub>)] / [m&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub><sup style="margin: 0px; padding: 0px; ">2</sup>&nbsp;- (&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub>)<sup style="margin: 0px; padding: 0px; ">2</sup>&nbsp;)]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-9</span>)</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">这时把a<sub style="margin: 0px; padding: 0px; ">0</sub>、a<sub style="margin: 0px; padding: 0px; ">1</sub>代入(式1-1)中， 此时的(式1-1)就是我们回归的元线性方程即：数学模型。</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">在回归过程中，回归的关联式是不可能全部通过每个回归数据点( x<sub style="margin: 0px; padding: 0px; ">1</sub>, y<sub style="margin: 0px; padding: 0px; ">1</sub>.&nbsp; x<sub style="margin: 0px; padding: 0px; ">2</sub>, y<sub style="margin: 0px; padding: 0px; ">2</sub>.&nbsp; &#8230;&nbsp;&nbsp;&nbsp; x<sub style="margin: 0px; padding: 0px; ">m</sub>&nbsp;, y<sub style="margin: 0px; padding: 0px; ">m</sub>&nbsp;),为了判断关联式的好坏，可借助相关系数&#8220;R&#8221;，统计量&#8220;F&#8221;，剩余标准偏差&#8220;S&#8221;进行判断；&#8220;R&#8221;越趋近于 1 越好；&#8220;F&#8221;的绝对值越大越好；&#8220;S&#8221;越趋近于 0 越好。</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">R = [&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub>Y<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;- m (&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;/ m)(&#8721;Y<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;/ m)]/ SQR{[&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub><sup style="margin: 0px; padding: 0px; ">2</sup>&nbsp;- m (&#8721;X<sub style="margin: 0px; padding: 0px; ">i</sub>&nbsp;/ m)<sup style="margin: 0px; padding: 0px; ">2</sup>][&#8721;Y<sub style="margin: 0px; padding: 0px; ">i</sub><sup style="margin: 0px; padding: 0px; ">2</sup>&nbsp;- m (&#8721;Y<sub style="margin: 0px; padding: 0px; ">i&nbsp;</sub>/ m)<sup style="margin: 0px; padding: 0px; ">2</sup>]}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;(<span style="margin: 0px; padding: 0px; color: #0000ff; ">式1-10</span>) *</p><p style="margin-top: 10px; margin-bottom: 10px; padding: 0px; ">在(式1-1)中，m为样本容量，即实验次数；X<sub style="margin: 0px; padding: 0px; ">i</sub>、Y<sub style="margin: 0px; padding: 0px; ">i</sub>分别任意一组实验X、Y的数值。</p><br /></p><img src ="http://www.cppblog.com/mysileng/aggbug/200131.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/mysileng/" target="_blank">鑫龙</a> 2013-05-09 16:24 <a href="http://www.cppblog.com/mysileng/archive/2013/05/09/200131.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>