﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-yefeng-随笔分类-编程之美心得</title><link>http://www.cppblog.com/yefeng/category/19354.html</link><description>夜风'blog</description><language>zh-cn</language><lastBuildDate>Wed, 23 May 2012 23:59:25 GMT</lastBuildDate><pubDate>Wed, 23 May 2012 23:59:25 GMT</pubDate><ttl>60</ttl><item><title>编程之美----寻找出现频率超过一半的数</title><link>http://www.cppblog.com/yefeng/archive/2012/05/23/175828.html</link><dc:creator>夜风</dc:creator><author>夜风</author><pubDate>Tue, 22 May 2012 16:39:00 GMT</pubDate><guid>http://www.cppblog.com/yefeng/archive/2012/05/23/175828.html</guid><wfw:comment>http://www.cppblog.com/yefeng/comments/175828.html</wfw:comment><comments>http://www.cppblog.com/yefeng/archive/2012/05/23/175828.html#Feedback</comments><slash:comments>4</slash:comments><wfw:commentRss>http://www.cppblog.com/yefeng/comments/commentRss/175828.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/yefeng/services/trackbacks/175828.html</trackback:ping><description><![CDATA[<div> <strong>问题：</strong><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 原题叫&#8220;寻找发帖王&#8221;，其实就是在n个数里，存在一个数x，出现频率超过n/2的数，要以最小的时间复杂度计算出这个x。<br /><br /><span style="font-weight: bold;">动机：</span><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   这个题目是昨晚无聊时，在CSDN论坛上看到的，起初我是这样想的：既然有个数x出现频率超过n/2，那如果排好序，那么第[n/2]个数一定就是x。这 样问题就规约为这样一个问题：&#8220;计算一组数的中位数&#8221;。《算法导论》有提出过解决办法，就是类似快速排序那样，使用分治算法，在O(n)复杂度内解决问 题。但算法性能依赖于数据的分布，最坏情况会达到O(n<sup>2</sup>)。<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 后来在网上搜了一下，发现这居然是《编程之美》上的。记得当初上大学的时候，我还看过，可怎么也想不起来了。看到书上提到的算法，不禁黯然称奇！于是产生了个想法，我决定重新看一遍，并且写一个系列的博客，写下自己的心得。<br /><br /><span style="font-weight: bold;">引理：</span><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; n个数中，数x出现频率超过n/2，那么从中去掉一对不相等的两个数，x在剩下的(n-2)个数中的出现频率依然超过n/2。<br /><br /><span style="font-weight: bold;">证明：</span><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 假设x出现了m次，则m &gt; n/2，原频率P0 = m/n &gt; 1/2，从n个数中去掉一对不相同的两个数&lt;a, b&gt;，有两种情况：<br /><ol><li>a != x, b != x。频率P1 = m/(n-2) &gt; m/n &gt; 1/2<br /></li><li>a = x, b != x。 频率P1 = (m - 1)/(n - 2)。P1 - P0 = (2m - n)/n(n - 2) &gt; 0。则 P1 &gt; P0 &gt; 1/2</li></ol><p><span style="font-weight: bold;">算法分析：</span></p><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   其实说到底非常简单，就是在一堆数里随便拿一个数，再找一个与它不相等的，然后一起扔掉，这样问题规模不断缩小，最终等到找不到一个不相等的数时，就成功 了。但要简化算法，就不能每拿一个数就统统找一遍。可以考虑准备一个队列，队列里放着暂时扔不掉的数。如从头开始，将a[0]放入队列，再看a[1]，如 果a[0] != a[1]，则扔掉a[1]和a[0]，a[0]从队列取出；如果a[0] ==  a[1]，则a[1]入队列，然后a[2]进行相同的操作，以此类推。</p><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 解法依然可以优化。显而易见，队列里所有的数总是全部相等的，既然相等就没有必要存入队列，只要知道：1.假想的队列里的数什么 2.队列的长度。</p><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 这样就得到了《编程之美》中的代码了：</p> <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><span style="color: #008080; ">&nbsp;1</span>&nbsp;<span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;data_more_than_half(</span><span style="color: #0000FF; ">const</span>&nbsp;<span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;arr[],&nbsp;</span><span style="color: #0000FF; ">const</span><span style="color: #000000; ">&nbsp;size_t&nbsp;size)&nbsp;{<br /></span><span style="color: #008080; ">&nbsp;2</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;candidate;<br /></span><span style="color: #008080; ">&nbsp;3</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;count&nbsp;</span><span style="color: #000000; ">=</span>&nbsp;<span style="color: #000000; ">0</span><span style="color: #000000; ">;<br /></span><span style="color: #008080; ">&nbsp;4</span>&nbsp;<span style="color: #000000; "><br /></span><span style="color: #008080; ">&nbsp;5</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">for</span><span style="color: #000000; ">(size_t&nbsp;i&nbsp;</span><span style="color: #000000; ">=</span>&nbsp;<span style="color: #000000; ">0</span><span style="color: #000000; ">;&nbsp;i&nbsp;</span><span style="color: #000000; ">&lt;</span><span style="color: #000000; ">&nbsp;size;&nbsp;i</span><span style="color: #000000; ">++</span><span style="color: #000000; ">)&nbsp;{<br /></span><span style="color: #008080; ">&nbsp;6</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">if</span><span style="color: #000000; ">&nbsp;(count&nbsp;</span><span style="color: #000000; ">==</span>&nbsp;<span style="color: #000000; ">0</span><span style="color: #000000; ">)&nbsp;{<br /></span><span style="color: #008080; ">&nbsp;7</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;candidate&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;arr[i];<br /></span><span style="color: #008080; ">&nbsp;8</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;count&nbsp;</span><span style="color: #000000; ">=</span>&nbsp;<span style="color: #000000; ">1</span><span style="color: #000000; ">;<br /></span><span style="color: #008080; ">&nbsp;9</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br /></span><span style="color: #008080; ">10</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">else</span><span style="color: #000000; ">&nbsp;{<br /></span><span style="color: #008080; ">11</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">if</span><span style="color: #000000; ">&nbsp;(candidate&nbsp;</span><span style="color: #000000; ">==</span><span style="color: #000000; ">&nbsp;arr[i])&nbsp;{<br /></span><span style="color: #008080; ">12</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;count</span><span style="color: #000000; ">++</span><span style="color: #000000; ">;<br /></span><span style="color: #008080; ">13</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br /></span><span style="color: #008080; ">14</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">else</span><span style="color: #000000; ">&nbsp;{<br /></span><span style="color: #008080; ">15</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;count</span><span style="color: #000000; ">--</span><span style="color: #000000; ">;<br /></span><span style="color: #008080; ">16</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br /></span><span style="color: #008080; ">17</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br /></span><span style="color: #008080; ">18</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;}<br /></span><span style="color: #008080; ">19</span>&nbsp;<span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">return</span><span style="color: #000000; ">&nbsp;candidate;<br /></span><span style="color: #008080; ">20</span>&nbsp;<span style="color: #000000; ">}</span></div> <p><br /> </p> <p><strong>应用：</strong></p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 代码看似简单，但我感到意犹未尽，正回味着，突然想到一个问题：如果条件（存在一个出现频率超过一半的数）不满足，那会出现什么情况？如何避免呢？</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 很显然，我们的解法就是基于这样一个条件的，一旦条件不满足，得到的数就没有任何意义。但不难发现，避免问题的出现也非常简单：验证找到的数是否出现频率超过一半。</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 这也是个常用的方法：假设检验法。</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 对于一个数组，假设存在一个数，它出现频率超过一半。然后在O(n)时间内找到这个数，再统计它出现的频率。这样就完美了！</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 于是可以得到一个同解的跳跃式问题：<span style="font-weight: bold;">检查一个数组中，是否存在一个数，它出现频率超过一半。</span><br /></p>  </div><img src ="http://www.cppblog.com/yefeng/aggbug/175828.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/yefeng/" target="_blank">夜风</a> 2012-05-23 00:39 <a href="http://www.cppblog.com/yefeng/archive/2012/05/23/175828.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>