﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-ken-随笔分类-program</title><link>http://www.cppblog.com/tompson/category/4889.html</link><description /><language>zh-cn</language><lastBuildDate>Thu, 22 May 2008 08:02:01 GMT</lastBuildDate><pubDate>Thu, 22 May 2008 08:02:01 GMT</pubDate><ttl>60</ttl><item><title>网页抓取的程序</title><link>http://www.cppblog.com/tompson/archive/2007/08/11/29773.html</link><dc:creator>ken</dc:creator><author>ken</author><pubDate>Sat, 11 Aug 2007 06:45:00 GMT</pubDate><guid>http://www.cppblog.com/tompson/archive/2007/08/11/29773.html</guid><wfw:comment>http://www.cppblog.com/tompson/comments/29773.html</wfw:comment><comments>http://www.cppblog.com/tompson/archive/2007/08/11/29773.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.cppblog.com/tompson/comments/commentRss/29773.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/tompson/services/trackbacks/29773.html</trackback:ping><description><![CDATA[<p>本打算做一个网络爬虫(crawler)的,但水平有限只做了基本的一些功能.&nbsp;思路: 肯定是要能先通过url连接到http服务器了,然后发送一个"GET url \n"的请求才能下载网页. 之后就是分析网页,比如辨认超链接和搜索关键词.<br><br>就是GET 这个东西搞不懂, 有的网页需要给完整的url, 有的只需要相对路径才正确. 怎么才能自动知道需要哪个啊?</p>
<br>source: <a href="http://www.cppblog.com/Files/tompson/getwebpage.rar">http://www.cppblog.com/Files/tompson/getwebpage.rar</a><br>(写的很烂, 供学习网络编程的同学参考) 
<img src ="http://www.cppblog.com/tompson/aggbug/29773.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/tompson/" target="_blank">ken</a> 2007-08-11 14:45 <a href="http://www.cppblog.com/tompson/archive/2007/08/11/29773.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>