﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-IT菜鸟-随笔分类-Tool</title><link>http://www.cppblog.com/micwu/category/18160.html</link><description>启航</description><language>zh-cn</language><lastBuildDate>Sun, 20 Nov 2011 14:08:43 GMT</lastBuildDate><pubDate>Sun, 20 Nov 2011 14:08:43 GMT</pubDate><ttl>60</ttl><item><title>[Tool]HtmlAgilityPack 一个解析Html的工具</title><link>http://www.cppblog.com/micwu/archive/2011/11/15/160203.html</link><dc:creator>micwu</dc:creator><author>micwu</author><pubDate>Tue, 15 Nov 2011 15:10:00 GMT</pubDate><guid>http://www.cppblog.com/micwu/archive/2011/11/15/160203.html</guid><wfw:comment>http://www.cppblog.com/micwu/comments/160203.html</wfw:comment><comments>http://www.cppblog.com/micwu/archive/2011/11/15/160203.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/micwu/comments/commentRss/160203.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/micwu/services/trackbacks/160203.html</trackback:ping><description><![CDATA[<p align="center"><span style="font-size: 18pt"><strong style="font-family: Arial; font-size: 14pt">Html Agility Pack &#9472;&#9472; 一个解析HTML的工具</strong><span style="color: red; font-size: 18pt"><strong><br /></strong><span style="color: red; font-size: 18pt"><span style="color: red; font-size: 18pt"><span style="font-size: 18pt">
<p><span style="font-family: Georgia; color: #000000; font-size: 12pt">.NET 框架类库</span><span style="font-family: Georgia; color: #000000; font-size: 12pt">本身没有提供工具分析</span><span style="font-family: Georgia; color: #000000; font-size: 12pt">HTML，以前常用的做法是用正则表达式，或者浏览器控件，或者MSHTML组件，甚至SgmlReader。SgmlReader可以将HTML转化成XML，然后你就可以使用System.Xml命名空间下的类对文件进行查询。</span><br /><br /><span style="font-family: Georgia; color: #000000; font-size: 12pt">CodePlex上有一个Html Agility Pack项目，是原生的.NET项目，不依赖MSHTML或者ActiveX/COM 对象。其中的HtmlDocument可以加载任何HTML文件(即使该文件是不well-formed的HTML)，然后允许你使用类似于System.Xml的对象模型对文件进行查询。</span></p>
<p align="left"><span style="font-family: Georgia; color: #000000; font-size: 12pt">官网地址：</span><a href="http://www.codeplex.com/htmlagilitypack"><font color="#000000"><span style="font-family: Georgia; font-size: 12pt">www.codeplex.com/htmlagilitypack</span></font></a><br /><br /><span style="font-family: Georgia; color: #000000; font-size: 12pt">例如：</span>&nbsp;</p>
<div class="cnblogs_code" align="left"><span style="font-family: Arial; color: #000000; font-size: 10pt">HtmlWeb webClient = <span style="font-family: Arial; color: #0000ff; font-size: 10pt">new </span></span><span style="font-family: Arial; color: #000000; font-size: 10pt">HtmlWeb();&nbsp;<br /></span><span style="font-family: Arial; color: #000000; font-size: 10pt">HtmlDocument&nbsp;doc&nbsp;</span><span style="font-family: Arial; color: #000000; font-size: 10pt">=</span><span style="color: #000000">&nbsp;</span><span style="font-family: Arial; color: #0000ff; font-size: 10pt">webClient</span><span style="font-family: Arial; color: #000000; font-size: 10pt">.Load(</span><span style="font-family: Arial; color: #800000; font-size: 10pt">"</span><span style="font-family: Arial; color: #800000; font-size: 10pt">file.htm</span><span style="font-family: Arial; color: #800000; font-size: 10pt">"</span><span style="font-family: Arial; color: #000000; font-size: 10pt">);<br /></span><span style="font-family: Arial; color: #000000; font-size: 10pt">HtmlNodeCollection nodes = doc.DocumentElement.SelectNodes("/html[1]/body[1]/div")<br />&nbsp;</span><span style="font-family: Arial; color: #0000ff; font-size: 10pt">foreach</span><span style="font-family: Arial; color: #000000; font-size: 10pt">(HtmlNode&nbsp;node&nbsp;</span><span style="font-family: Arial; color: #0000ff; font-size: 10pt">in</span><span style="font-family: Arial; color: #000000; font-size: 10pt">&nbsp;nodes</span><span style="font-family: Arial; color: #000000; font-size: 10pt">)</span><span style="font-family: Georgia; color: #000000"><br /></span><span style="font-family: Georgia; color: #000000; font-size: 10pt">&nbsp;{<br /></span><span style="font-family: Georgia; color: #000000; font-size: 10pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Console.WriteLine(node.InnerText.Trim());</span><span style="font-family: Georgia; color: #000000"><br /></span><span style="font-family: Georgia; color: #000000; font-size: 10pt">&nbsp;}<br /></span><span style="font-family: Georgia; color: #000000; font-size: 10pt">&nbsp;doc = null;<br /></span><span style="font-family: Georgia; color: #000000; font-size: 10pt">&nbsp;webClient = null;<br /></span><span style="font-family: Georgia; color: #000000; font-size: 10pt">&nbsp;nodes = </span><span style="font-family: Georgia; color: #000000; font-size: 10pt">null;</span>&nbsp;<br /></span></div>
<div><span style="color: #000000; font-size: 12pt"><strong><br />Q: 如何选择HTML结点？</strong></span><br /><span style="color: #000000; font-size: 10pt"><font size="2">//根据层次结构 因为HTML具阶层性(Hierarchy)<br /></font>HtmlNode node1 = doc.DocumentNode.SelectSingleNode("/html[1]/body[1]/div[1]/div[2]/div[7]/div[1]/div[3]/ol[1]/li[1]/div[1]/div[2]/address");<br /><font size="2">HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("/html[1]/body[1]/div[1]/div[2]/div[3]/div[2]/div[1]/div[1]/div[1]/div");</font><br />//根据ID</span><br /><span style="color: #000000; font-size: 10pt">HtmlNode node2 = doc.DocumentNode.SelectSingleNode("//p[@id='prop_detail_qt_prop_1' ");<br /><br /><strong>几个比较有用的连接:<br /></strong><br />1.&nbsp;<a href="http://hi.baidu.com/huangyunjun999/blog/item/cdd962cabce1d09dc8176868.html">开源项目Html Agility Pack实现快速解析Html<br /><br /></a><font color="#399ab2">2.&nbsp;<a href="http://www.cnblogs.com/huyong/articles/2175216.html"><font color="#399ab2">Pack &#9472;&#9472; 一个分析HTML的工具</font> </a></font><br /></span><span style="color: #000000; font-size: 12pt">
<p><span style="font-size: 10pt">3. </span><a href="http://zhoufoxcn.blog.51cto.com/792419/595344" target="_blank"><font color="#000000"><span style="font-size: 10pt">HTML解析利器HtmlAgilityPack </span></font></a><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"></span></strong></p>
<p><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 10pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US">4. <a href="http://iofeng.com/Ublog/ShowBlog.aspx?id=29" target="_blank"><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 10pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US">c#蜘蛛程序之HTML解析利器HtmlAgilityPack</span></strong></a></span></strong></p>
<p><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"></span></strong><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"><span class="link_title"><a title="开源项目Html Agility Pack实现快速解析Html" href="http://blog.csdn.net/malimalihun/article/details/6683434"><font color="#000000"><a href="http://blog.csdn.net/malimalihun/article/details/6683434" target="_blank"><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"><span style="font-size: 10pt" class="link_title"></a>5. <a title="开源项目Html Agility Pack实现快速解析Html" href="http://blog.csdn.net/malimalihun/article/details/6683434"><font color="#000000"><span style="font-size: 10pt">开源项目Html Agility Pack实现快速解析Html</span></font></a></span> </span></strong></a></font></a></span></span></strong></p>
<p><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"><span style="font-size: 10pt" class="link_title">6. <a href="http://kb.cnblogs.com/a/1627706/" target="_blank"><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"><span style="font-size: 10pt" class="link_title">一款很不错的html转xml工具-Html Agility Pack</span></span></strong></a></span></span></strong><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"><span class="link_title"></span></span></strong><strong><span style="font-family: 'Verdana', 'sans-serif'; color: rgb(255,102,0); font-size: 14pt; mso-font-kerning: 0pt; mso-bidi-font-family: 宋体" lang="EN-US"></span></strong></p><br /></span></div>
<h2></h2></span></span></span></span><img src ="http://www.cppblog.com/micwu/aggbug/160203.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/micwu/" target="_blank">micwu</a> 2011-11-15 23:10 <a href="http://www.cppblog.com/micwu/archive/2011/11/15/160203.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>[Tool] PowerGREP</title><link>http://www.cppblog.com/micwu/archive/2011/11/14/160091.html</link><dc:creator>micwu</dc:creator><author>micwu</author><pubDate>Mon, 14 Nov 2011 08:21:00 GMT</pubDate><guid>http://www.cppblog.com/micwu/archive/2011/11/14/160091.html</guid><wfw:comment>http://www.cppblog.com/micwu/comments/160091.html</wfw:comment><comments>http://www.cppblog.com/micwu/archive/2011/11/14/160091.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/micwu/comments/commentRss/160091.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/micwu/services/trackbacks/160091.html</trackback:ping><description><![CDATA[Publisher&#8217;s notes: PowerGREP is a powerful Windows grep tool. Quickly search through large numbers of files on your PC or network, including text and binary files, compressed archives, MS Word documents, Excel spreadsheets, <span class="caps">PDF</span> files, OpenOffice files, etc. Find the information you want with powerful text patterns (regular expressions) specifying the form of what you want, instead of literal text. Search and replace with one or many regular expressions to comprehensively maintain web sites, source code, reports, etc. Extract statistics and knowledge from logs files and large data sets. <br />&#8226;Learn how you can find information faster and comprehensively maintain large sets of files with PowerGREP. <br /><br /><img border="0" alt="Image" src="http://www.pcworld.idg.com.au/downloads/images/118x136/dimg/Just_Great_Software_Logo.png" /><br /><br /><a href="http://www.pcworld.idg.com.au/downloads/product/299/powergrep/">http://www.pcworld.idg.com.au/downloads/product/299/powergrep/<br /></a><br />How to use(Regular expression):<br />1. Include folders and subfolders which you want to search.<br />2. Fill file types( like *.cpp, etc) in Include files.<br />3. You also can filter File Modification Dates and File Sizes.<br />4. Select the Action Tab, then fill in the regular expressions.<br />5. Click the Search (Ctrl +F9)<br /><br />Thanks,<br />micwu. <img src ="http://www.cppblog.com/micwu/aggbug/160091.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/micwu/" target="_blank">micwu</a> 2011-11-14 16:21 <a href="http://www.cppblog.com/micwu/archive/2011/11/14/160091.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>