﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-MemoryGarden's Blog-随笔分类-Hadoop Streaming</title><link>http://www.cppblog.com/MemoryGarden/category/12861.html</link><description>努力

                                  -----------大能猫</description><language>zh-cn</language><lastBuildDate>Sun, 24 Jan 2010 05:16:52 GMT</lastBuildDate><pubDate>Sun, 24 Jan 2010 05:16:52 GMT</pubDate><ttl>60</ttl><item><title>c++ &amp;&amp; python 实现　Hadoop Streaming 　的　partitioner　和　模块化 </title><link>http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106312.html</link><dc:creator>memorygarden</dc:creator><author>memorygarden</author><pubDate>Sat, 23 Jan 2010 19:47:00 GMT</pubDate><guid>http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106312.html</guid><wfw:comment>http://www.cppblog.com/MemoryGarden/comments/106312.html</wfw:comment><comments>http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106312.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/MemoryGarden/comments/commentRss/106312.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/MemoryGarden/services/trackbacks/106312.html</trackback:ping><description><![CDATA[<br>
这些东西是我自己的理解，　如果有错误的地方，或者有哪些地方走了弯路，请帮我指出我的错误，谢谢<br>
<br>
Hadoop Streaming 是一个工具， 代替编写Java的实现类，而利用可执行程序来完成map-reduce过程<br>
<br>
工作流程　：　<br>
<br>
InputFile --&gt; mappers --&gt; [Partitioner] --&gt; reducers --&gt; outputFiles<br>
<br>
理解 :　<br>
1 输入文件，可以是指定远程文件系统内的文件夹下的 *<br>
2 通过集群自己分解到各个PC上，每个mapper是一个可执行文件，相应的启动一个进程，来实现你的逻辑<br>
3
mapper　的输入为标准输入，所以，任何能够支持标准输入的可执行的东西，c,c++(编译出来的可执行文件),python,......都可以作
为mapper 和
reducer　mapper的输出为标准输出，如果有Partitioner,就给它，如果没有，它的输出将作为reducer的输入<br>
4 Partitioner 为可选的项，二次排序，可以对结果进行分类打到结果文件里面,它的输入是mapper的标准输出，它的输出，将作为reducer的标准输入<br>
5 reducer 同 mapper<br>
6 输出文件夹，在远端文件不能重名<br>
<br>
Hadoop Streaming<br>
<br>
1 ： hadoop-streaming.jar 的位置 ： $HADOOP_HOME/contrib/streaming 内<br>
<br>
官方上面关于hadoop-streaming 的介绍已经很详细了，而且也有了关于python的例子，我就不说了,这里总结下自己的经验<br>
<br>
1 指定 mapper or reducer 的 task 官方上说要用 -jobconf　但是这个参数已经过时，不可以用了，官方说要用
-D, 注意这个-D是要作为最开始的配置出现的，因为是在maper 和 reducer　执行之前，就需要硬性指定好的，所以要出现在参数的最前面
./bin/hadoop jar hadoop-0.19.2-streaming.jar -D .........-input
........　类似这样，这样，即使你程序最后只指定了一个输出管道，但是还是会有你指定的task数量的结果文件，只不过多余的就是空的　实验以下
就知道了<br>
<br>
2 关于二次排序，由于是用的streaming 所以，在可执行文件内，只能够处理逻辑，还有就是输出，当然我们也可以指定二次排序，但是由于是全部参数化，不是很灵活。比如:<br>
10.2.3.40 &nbsp;&nbsp; 1<br>
11.22.33.33&nbsp;&nbsp;&nbsp; 1<br>
www.renren.com 1<br>
www.baidu.com&nbsp;&nbsp;&nbsp; 1<br>
10.2.3.40&nbsp;&nbsp;&nbsp; 1<br>
<br>
这样一个很规整的输入文件，需求是要把记录独立的ip和url的count　但是输出文件要分分割出来。<br>
<br>
官方网站的例子，是指定 key　然后对key 指定 主-key　和 key　用来排序，而 主-key 用来二次排序，这样会输出你想要的东西，　但是对于上面最简单的需求，对于传递参数，我们如何做呢?<br>
<br>
其实我们还是可以利用这一点，在我们mapper　里面，还是按照/t来分割key
value　但是我们要给key指定一个主-key　用来给Partitioner
来实现二次排序，所以我们可以稍微处理下这个KEY,我们可以简单的判断出来ip　和
url　的区别，这样，我们就人为的加上一个主-key　我们在mapper里面，给每个key人为的加上一个"标签"，用来给partitioner做
二次排序用，比如我们的mapper的输出是这样<br>
<br>
D&amp;10.2.3.40 &nbsp;&nbsp; 1<br>
D&amp;11.22.33.33&nbsp;&nbsp;&nbsp; 1<br>
W&amp;www.renren.com 1<br>
W&amp;www.baidu.com&nbsp;&nbsp;&nbsp; 1<br>
D&amp;10.2.3.40&nbsp;&nbsp;&nbsp; 1
<br>
<br>
然后通过传递命令参数<br>
<pre class="code">-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner //指定要求二次排序<br>-jobconf map.output.key.field.separator='&amp;'　//这里如果不加两个单引号的话我的命令会死掉<br>-jobconf num.key.fields.for.partition=1　//这里指第一个 &amp;　符号来分割，保证不会出错<br><br>这样我们就可以通过 partitioner　来实现二次排序了<br><br>在reducer里面，我们再把"标签"摘掉(不费吹灰之力)就可以做到悄无声息的完成二次排序了。<br><br>3:　关于模块化<br><br>(强调　：　没有在集群上测试，只在单机上做测试)<br><br>程序员最悲剧的就是不能代码复用，做这个也一样，用hadoop-streaming　也一样，要做到代码重用，是我第一个考虑的问题<br>当我看到 -file(详细可以看官方网站上的讲解)　的时候，我就想到利用这个东西，果然，我的在本机上建立了一个py模块，简单的一个函数<br>然后在我的mapper里面import 它，本地测试通过后，利用-file　把模块所在的问价夹用 -file moudle/*　这个参数，传入streaming<br>执行的结果毫无错误，这样，我们就可以抽象出来一些模块的东西，来实现我们模块化的需求<br><br>注 : 不要忘记 chmod +x *.py 　将py　变成可执行的，不然不可以运行<br><br>代码 :　<br><br>1: 模块代码 mg.py 用来给 mapper　贴标签<br><br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>-->def&nbsp;mgFunction(line):<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(line[0]&nbsp;&gt;=&nbsp;'0'&nbsp;and&nbsp;line[0]&nbsp;&lt;=&nbsp;'9'):<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;"D&amp;"&nbsp;+&nbsp;line<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;"W&amp;"&nbsp;+&nbsp;line<br></div>
<br><br>2: mapper.py <br><br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>-->#!/usr/bin/env&nbsp;python<br>import&nbsp;sys<br>sys.path.append('/home/liuguoqing/Desktop/hadoop-0.19.2/moudle')<br>import&nbsp;mg<br>for&nbsp;line&nbsp;in&nbsp;sys.stdin:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;line&nbsp;=&nbsp;mg.mgFunction(line)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;line&nbsp;=&nbsp;line.strip()<br>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print&nbsp;line<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;words&nbsp;=&nbsp;line.split()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print&nbsp;'%s\t%s'&nbsp;%&nbsp;(words[0],&nbsp;words[1])<br></div>
<br></pre>
3: reducer.py<br>
<br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>-->#!/usr/bin/env&nbsp;python<br>import&nbsp;sys<br>user_login_day&nbsp;=&nbsp;{}<br><br>for&nbsp;line&nbsp;in&nbsp;sys.stdin:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;line&nbsp;=&nbsp;line[2:]//去掉帽子<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;line&nbsp;=&nbsp;line.strip()<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;userid,&nbsp;day&nbsp;=&nbsp;line.split('\t',&nbsp;1)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;user_login_day[userid]&nbsp;=&nbsp;user_login_day.get(userid,&nbsp;0)&nbsp;+&nbsp;1<br><br>for&nbsp;uid&nbsp;in&nbsp;user_login_day.keys():&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print&nbsp;'%s\t%d'&nbsp;%&nbsp;(uid,&nbsp;user_login_day[uid])<br></div>
<br>
<br>
这样就实现了模块化的可以二次排序的hadoop-streaming<br>
<br>
命令　<br>
<br>
./bin/hadoop jar hadoop-0.19.2-streaming.jar \<br>
#streaming jar<br>
-D mapred.reduce.tasks=2&nbsp; \<br>
#指定2个reduce来处理<br>
-input user_login_day-input2/*&nbsp; \<br>
#指定输入文件　可以用 dir/*　方式<br>
-output user_login_day-output102 <br>
#指定输出文件夹<br>
-mapper ~/Desktop/hadoop-0.19.2/python/mapper/get_user_login_day_back.py&nbsp; \<br>
#指定mapper　可执行文件 我用全路径，好像用相对路径会出错...<br>
-reducer ~/Desktop/hadoop-0.19.2/python/reducer/get_user_login_day_back.py \<br>
#指定reducer 可执行文件　<br>
-file ~/Desktop/hadoop-0.19.2/moudle/* \<br>
#指定模块化的库文件 dir/*　模式<br>
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \<br>
#指定 partitioner　参数为class<br>
-jobconf map.output.key.field.separator='&amp;' \<br>
#指定　主-key　的分割符号为 '&amp;'<br>
-jobconf num.key.fields.for.partition=1
<br>
#指定为第一个&#8216;&amp;&#8217;<br>
<br>
liuguoqing@liuguoqing-desktop:~/Desktop/hadoop-0.19.2$ ./bin/hadoop jar
hadoop-0.19.2-streaming.jar -D mapred.reduce.tasks=2 -input
user_login_day-input2/* -output user_login_day-output102 -mapper
~/Desktop/hadoop-0.19.2/python/mapper/get_user_login_day_back.py
-reducer
~/Desktop/hadoop-0.19.2/python/reducer/get_user_login_day_back.py -file
~/Desktop/hadoop-0.19.2/moudle/* -partitioner
org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner -jobconf
map.output.key.field.separator='&amp;' -jobconf
num.key.fields.for.partition=1<br>10/01/24 03:19:15 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.<br>packageJobJar:
[/home/liuguoqing/Desktop/hadoop-0.19.2/moudle/mg.py,
/home/liuguoqing/Desktop/hadoop-0.19.2/moudle/mg.pyc,
/tmp/hadoop-liuguoqing/hadoop-unjar6780057097425964518/] []
/tmp/streamjob3100401358387519950.jar tmpDir=null<br>10/01/24 03:19:15 INFO mapred.FileInputFormat: Total input paths to process : 2<br>10/01/24 03:19:15 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-liuguoqing/mapred/local]<br>10/01/24 03:19:15 INFO streaming.StreamJob: Running job: job_201001221008_0065<br>10/01/24 03:19:15 INFO streaming.StreamJob: To kill this job, run:<br>10/01/24
03:19:15 INFO streaming.StreamJob:
/home/liuguoqing/Desktop/hadoop-0.19.2/bin/../bin/hadoop job&nbsp;
-Dmapred.job.tracker=hdfs://localhost:9881 -kill job_201001221008_0065<br>10/01/24 03:19:15 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001221008_0065<br>10/01/24 03:19:16 INFO streaming.StreamJob:&nbsp; map 0%&nbsp; reduce 0%<br>10/01/24 03:19:17 INFO streaming.StreamJob:&nbsp; map 33%&nbsp; reduce 0%<br>10/01/24 03:19:18 INFO streaming.StreamJob:&nbsp; map 67%&nbsp; reduce 0%<br>10/01/24 03:19:19 INFO streaming.StreamJob:&nbsp; map 100%&nbsp; reduce 0%<br>10/01/24 03:19:27 INFO streaming.StreamJob:&nbsp; map 100%&nbsp; reduce 50%<br>10/01/24 03:19:32 INFO streaming.StreamJob:&nbsp; map 100%&nbsp; reduce 100%<br>10/01/24 03:19:32 INFO streaming.StreamJob: Job complete: job_201001221008_0065<br>10/01/24 03:19:32 INFO streaming.StreamJob: Output: user_login_day-output102<br>liuguoqing@liuguoqing-desktop:~/Desktop/hadoop-0.19.2$ ./bin/hadoop dfs -ls user_login_day-output102<br>Found 3 items<br>drwxr-xr-x&nbsp;&nbsp; - liuguoqing supergroup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 2010-01-24 03:19 /user/liuguoqing/user_login_day-output102/_logs<br>-rw-r--r--&nbsp;&nbsp; 1 liuguoqing supergroup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 25 2010-01-24 03:19 /user/liuguoqing/user_login_day-output102/part-00000<br>-rw-r--r--&nbsp;&nbsp; 1 liuguoqing supergroup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 47 2010-01-24 03:19 /user/liuguoqing/user_login_day-output102/part-00001<br><br>
liuguoqing@liuguoqing-desktop:~/Desktop/hadoop-0.19.2$ ./bin/hadoop dfs -cat user_login_day-output102/part-00000<br>54321&nbsp;&nbsp; &nbsp;2<br>99999&nbsp;&nbsp; &nbsp;1<br>12345&nbsp;&nbsp; &nbsp;12<br>liuguoqing@liuguoqing-desktop:~/Desktop/hadoop-0.19.2$ ./bin/hadoop dfs -cat user_login_day-output102/part-00001<br>http://www.renren.com&nbsp;&nbsp; &nbsp;3<br>http://www.baidu.com&nbsp;&nbsp; &nbsp;3<br><br>
以上为操作结果显示<br>
<br>
<br>
4 : c++ 的应用<br>
<br>
只要写两个个标准输入输出的mapper reducer，然后 <br>
g++ mapper.cpp -o mapper <br>
g++ reducer.cpp -o reducer<br>
生成的两个可执行的 mapper reducer 的文件作为mapper　和 reducer 参数就可以了，执行的命令和上面是一样的<br>
<br>
代码　：　<br>
<br>
mapper.cpp<br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>-->#include&nbsp;&lt;stdio.h&gt;<br>#include&nbsp;&lt;string&gt;<br>#include&nbsp;&lt;iostream&gt;<br>using&nbsp;namespace&nbsp;std;<br><br>int&nbsp;main(){<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;string&nbsp;key;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;string&nbsp;value;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;while(cin&gt;&gt;key){<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cin&gt;&gt;value;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cout&lt;&lt;key&lt;&lt;"\t"&lt;&lt;value&lt;&lt;endl;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;0;<br>}<br><br></div>
<br>
<br>
reducer.cpp<br>
<br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>-->#include&nbsp;&lt;stdio.h&gt;<br>#include&nbsp;&lt;string&gt;<br>#include&nbsp;&lt;map&gt;<br>#include&nbsp;&lt;iostream&gt;<br>using&nbsp;namespace&nbsp;std;<br>int&nbsp;main(){<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;string&nbsp;key;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;string&nbsp;value;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;map&lt;string,&nbsp;int&gt;&nbsp;word2count;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;map&lt;string,&nbsp;int&gt;&nbsp;::&nbsp;iterator&nbsp;it;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;while(cin&gt;&gt;key){<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cin&gt;&gt;value;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;it&nbsp;=&nbsp;word2count.find(key);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if(it&nbsp;!=&nbsp;word2count.end()){<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;++it-&gt;second;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;word2count.insert(make_pair(key,&nbsp;1));<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;it-&gt;second&nbsp;=&nbsp;0;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for(it&nbsp;=&nbsp;word2count.begin();&nbsp;it&nbsp;!=&nbsp;word2count.end();&nbsp;++it){<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cout&lt;&lt;it-&gt;first&lt;&lt;"\t"&lt;&lt;it-&gt;second&lt;&lt;endl;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;0;<br>}<br></div>
<br>
<br>
这样就可以利用c++来编写 hadoop map-reduce　了。<br>
<br>
<br>
注　： 　以上操作均没有在集群机上测试，如果有错误，请大家指出。谢谢<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>  <img src ="http://www.cppblog.com/MemoryGarden/aggbug/106312.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/MemoryGarden/" target="_blank">memorygarden</a> 2010-01-24 03:47 <a href="http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106312.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop 单机搭建</title><link>http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106311.html</link><dc:creator>memorygarden</dc:creator><author>memorygarden</author><pubDate>Sat, 23 Jan 2010 18:19:00 GMT</pubDate><guid>http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106311.html</guid><wfw:comment>http://www.cppblog.com/MemoryGarden/comments/106311.html</wfw:comment><comments>http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106311.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/MemoryGarden/comments/commentRss/106311.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/MemoryGarden/services/trackbacks/106311.html</trackback:ping><description><![CDATA[<a title="官方网站" href="http://hadoop.apache.org/common/docs/r0.18.2/cn/quickstart.html">官方网站</a> 说的很明白， 这里有个地方需要改一下<br><br>我用的版本是<a title="0.19.2" href="http://apache.etoak.com/hadoop/core/hadoop-0.19.2/hadoop-0.19.2.tar.gz">0.19.2</a><br><br>官方的配置有一个地方需要更改以下， 其他的按照官方说的就可以搭建起来单机的版本<br><br>原版本 ： <br><br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>--><span style="color: #000000;">配置<br><br>使用如下的&nbsp;conf/hadoop-site.xml:<br></span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">configuration</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">fs.default.name</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">localhost:9000</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">mapred.job.tracker</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">localhost:9001</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">dfs.replication</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">1</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br></span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">configuration</span><span style="color: #0000ff;">&gt;</span></div>
<br>更改后：<br><br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>--><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">configuration</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">fs.default.name</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">hdfs://localhost:9900/</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">mapred.job.tracker</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">hdfs://localhost:9901/</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">dfs.replication</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">name</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;">1</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">value</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br>&nbsp;&nbsp;</span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">property</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br></span><span style="color: #0000ff;">&lt;/</span><span style="color: #800000;">configuration</span><span style="color: #0000ff;">&gt;</span><span style="color: #000000;"><br></span></div>
<br><br>不然会LOG里面会有警告的。<br><br><br><br><br><br> <img src ="http://www.cppblog.com/MemoryGarden/aggbug/106311.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/MemoryGarden/" target="_blank">memorygarden</a> 2010-01-24 02:19 <a href="http://www.cppblog.com/MemoryGarden/archive/2010/01/24/106311.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>