﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-&lt;a href=http://minidx.com&gt;全文检索(http://minidx.com)&lt;/a&gt;-随笔分类-Minidx全文检索相关</title><link>http://www.cppblog.com/minidxer/category/4995.html</link><description>&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-2872691373402092";
google_ad_width = 468;
google_ad_height = 15;
google_ad_format = "468x15_0ads_al_s";
google_ad_channel = "";
google_color_border = "FFFFFF";
google_color_bg = "006699";
google_color_link = "FFFFFF";
google_color_text = "003366";
google_color_url = "003366";
//--&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-2872691373402092";
/* 468x60, CppBlog顶部 */
google_ad_slot = "0313721199";
google_ad_width = 468;
google_ad_height = 60;
//--&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;</description><language>zh-cn</language><lastBuildDate>Wed, 21 May 2008 00:03:45 GMT</lastBuildDate><pubDate>Wed, 21 May 2008 00:03:45 GMT</pubDate><ttl>60</ttl><item><title>Minidx组件从doc,Xls,Pdf……等抽取文本的C++ Demo(VS2005工程)</title><link>http://www.cppblog.com/minidxer/archive/2008/01/10/40851.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Thu, 10 Jan 2008 01:24:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2008/01/10/40851.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/40851.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2008/01/10/40851.html#Feedback</comments><slash:comments>8</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/40851.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/40851.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 发出<a href=http://blog.minidx.com/2007/12/31/334.html>VB.net Demo</a>后，不少人发来邮件询问C++如何调用，因为邮件较多，不逐一回复了。抓紧时间写了个C++的Sample,用法其实和Vb.net差别不大，代码下载在<a href=http://cn.minidx.com/index.php?option=com_docman&task=cat_view&gid=17>这里</a>（相关文档资料分类中的“Doc,Xls,Pdf等文件中抽取文本的Com组件及Demo(VC++)源代码”）,具体可以参照<a href=http://blog.minidx.com/2008/01/10/373.html>这里</a>，下面是调用部分的代码(C/C++都可以直接调用)和实际截图：&nbsp;&nbsp;<a href='http://www.cppblog.com/minidxer/archive/2008/01/10/40851.html'>阅读全文</a><img src ="http://www.cppblog.com/minidxer/aggbug/40851.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2008-01-10 09:24 <a href="http://www.cppblog.com/minidxer/archive/2008/01/10/40851.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>《超高速中日英分词的实现(10MB/S)》实现原理分析说明文档下载</title><link>http://www.cppblog.com/minidxer/archive/2008/01/01/39643.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Tue, 01 Jan 2008 00:05:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2008/01/01/39643.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/39643.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2008/01/01/39643.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/39643.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/39643.html</trackback:ping><description><![CDATA[<span style="color: red;">&nbsp;&nbsp;&nbsp; 2008/1/1更新：IE6.0无法下载问题已经修正，需要的请重新下载~</span><span class="postTitle2"><span style="font-weight: bold;"></span><br>不少人对《</span><a href="http://www.cppblog.com/minidxer/archive/2007/09/07/31723.html" id="_a293deb7fc2_HomePageDays_ctl00_DayList_ctl00_TitleUrl" class="postTitle2">超高速中日英分词的实现(10MB/S)</a><span class="postTitle2">》的实现比较感兴趣，今年9月份的时候刚好做过一份资料，用C++代码分析了内部分词的转换，资料的说明部分可以参考《</span><a href="http://blog.minidx.com/2007/12/24/296.html" rel="bookmark" onclick="parent.location.href='http://redirect.alexa.com/redirect?http://blog.minidx.com/2007/12/24/296.html';return event.returnvalue=" false="">双数组Trie(Double Array Trie)实现原理的一点剖析</a><span class="postTitle2">》,实现的原理和</span><a href="http://linux.thai.net/%7Ethep/datrie/datrie.html" target="_blank">datrie</a>（英文），<a href="http://pine.kuee.kyoto-u.ac.jp/KU-NTT-WS-2005/">京都大学情報学研究科</a>的<a href="http://mecab.sourceforge.net/" target="_blank">Mecab</a>（日文）还有<a href="http://chasen-legacy.sourceforge.jp/">chasen</a>（日文）一样的，都有完整的C/C++代码，有兴趣的可以下来分析一下。<br>分析说明文档（PDF）可以在<a href="http://cn.minidx.com/index.php?option=com_docman&amp;task=cat_view&amp;gid=17&amp;Itemid=38">这里下载</a>(<a href="http://cn.minidx.com/component/option,com_docman/task,doc_download/gid,48/" class="dm_icon">
</a>
<span class="dm_name">双数组Trie(Double Array Trie)实现原理</span>)。<span style="font-weight: bold;">阅读者需要对计算机内部编码比较熟悉，否则可能无法理解其中的转化过程</span>，发布这一文档只是为了分享，当然我也很乐意回复你在阅读中所遇到的任何问题，：）<br><img src ="http://www.cppblog.com/minidxer/aggbug/39643.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2008-01-01 08:05 <a href="http://www.cppblog.com/minidxer/archive/2008/01/01/39643.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>分享UTF8编码的中文词库，有需要的来下载吧</title><link>http://www.cppblog.com/minidxer/archive/2008/01/01/39467.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Tue, 01 Jan 2008 00:04:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2008/01/01/39467.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/39467.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2008/01/01/39467.html#Feedback</comments><slash:comments>6</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/39467.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/39467.html</trackback:ping><description><![CDATA[&nbsp;&nbsp; <span style="color: red;">2008/1/1更新：IE6.0无法下载问题已经修正，需要的请重新下载~</span><br>本来不准备发这篇文章的，不过看到《<a href="http://blog.minidx.com/2007/12/12/241.html">UTF8编码的中文词库下载</a>》发了仅仅一周，下载量居然达到2200多次，每天都有NN位针对中文词库来自搜索引擎的朋友，看来需要这样的词库的朋友还是挺多的，想写自己分词系统的或者有其他需要的朋友从<a href="http://cn.minidx.com/index.php?option=com_docman&amp;task=cat_view&amp;gid=17">这里</a>的&#8220;<a href="http://cn.minidx.com/index.php?option=com_docman&amp;task=cat_view&amp;gid=17" class="dm_icon"><img src="http://cn.minidx.com/components/com_docman/themes/default/images/icons/32x32/folder.png" alt="folder icon" border="0"></a>	<span class="dm_name">相关文档资料</span>&#8221;<span class="dm_name">中直接下载</span><img src ="http://www.cppblog.com/minidxer/aggbug/39467.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2008-01-01 08:04 <a href="http://www.cppblog.com/minidxer/archive/2008/01/01/39467.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>从Word,Xls,Pdf……等文件中直接读取文本内容的组件（纯C++的com DLL）和Vb.net demo源代码下载</title><link>http://www.cppblog.com/minidxer/archive/2007/12/31/40064.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Mon, 31 Dec 2007 14:45:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2007/12/31/40064.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/40064.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2007/12/31/40064.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/40064.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/40064.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp; PS:祝大家2008年学业有成,工作顺心,越过越开心~~~<br>不需要安装Word，Excel，Adobe Reader等应用程序就可以直接读取doc，xls，pdf中的文本内容，这样的功能在很多的场合都会用到，比如搜索引擎抓取各种格式的文件进行索引，比如做一个自己的文本阅读器&#8230;&#8230;Minidx的这一模块将可以使你简单的实现这一功能。具体使用说明参照《<a href="http://blog.minidx.com/2007/12/31/334.html" rel="bookmark" onclick="parent.location.href='http://redirect.alexa.com/redirect?http://blog.minidx.com/2007/12/31/334.html';return event.returnvalue=" false="">利用Minidx Extract-Text Com组件从Word,Xls,Pdf&#8230;&#8230;等文件中读取文本内容</a>》，组件和Demo源代码下载在这里的《<a href="http://cn.minidx.com/index.php?option=com_docman&amp;task=doc_download&amp;gid=49&amp;Itemid=38" class="dm_name">Doc,Xls,Pdf等文件中抽取文本的Com组件及Demo源代码</a>》。该组件可用于任何商业和非商业的用途，如果你愿意的话，可以发一个邮件给我告诉我这一模块被用在了你的项目中，当然这不是必需的，：）,下面是中日英Word中抽取文本的效果图：<br><img alt="" src="http://blog.minidx.com/wp-content/uploads/2007/12/result-engilsh.jpg"><br><img alt="" src="http://blog.minidx.com/wp-content/uploads/2007/12/select-japanese-file.jpg"><br><img alt="" src="http://blog.minidx.com/wp-content/uploads/2007/12/select-chinese-file.jpg"><img src ="http://www.cppblog.com/minidxer/aggbug/40064.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2007-12-31 22:45 <a href="http://www.cppblog.com/minidxer/archive/2007/12/31/40064.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Minidx字典整理程序发布(包含源代码)</title><link>http://www.cppblog.com/minidxer/archive/2007/09/09/31885.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Sun, 09 Sep 2007 08:34:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2007/09/09/31885.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/31885.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2007/09/09/31885.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/31885.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/31885.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 几千条数据排序并去除重复纪录可以有很多排序算法直接内存中操作，但是假如说有几百万条记录需要处理……&nbsp;&nbsp;<a href='http://www.cppblog.com/minidxer/archive/2007/09/09/31885.html'>阅读全文</a><img src ="http://www.cppblog.com/minidxer/aggbug/31885.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2007-09-09 16:34 <a href="http://www.cppblog.com/minidxer/archive/2007/09/09/31885.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>大牛们是否还记得这些补码运算公式</title><link>http://www.cppblog.com/minidxer/archive/2007/09/08/31823.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Sat, 08 Sep 2007 02:49:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2007/09/08/31823.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/31823.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2007/09/08/31823.html#Feedback</comments><slash:comments>6</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/31823.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/31823.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 还记得大学编译原理老师经常向我们灌输“位运算=高性能……”&nbsp;&nbsp;<a href='http://www.cppblog.com/minidxer/archive/2007/09/08/31823.html'>阅读全文</a><img src ="http://www.cppblog.com/minidxer/aggbug/31823.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2007-09-08 10:49 <a href="http://www.cppblog.com/minidxer/archive/2007/09/08/31823.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>超高速中日英分词的实现(10MB/S)</title><link>http://www.cppblog.com/minidxer/archive/2007/09/07/31723.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Thu, 06 Sep 2007 16:25:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2007/09/07/31723.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/31723.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2007/09/07/31723.html#Feedback</comments><slash:comments>17</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/31723.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/31723.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 重写了Minidx的分词模块，实现了超高速分词 (10MB/S)以及相当的准确率。当然其实还包括韩，法，德……等计算机上可以显示的语言,更多信息请到http://minidx.com&nbsp;&nbsp;<a href='http://www.cppblog.com/minidxer/archive/2007/09/07/31723.html'>阅读全文</a><img src ="http://www.cppblog.com/minidxer/aggbug/31723.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2007-09-07 00:25 <a href="http://www.cppblog.com/minidxer/archive/2007/09/07/31723.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Minidx机能概要设计书(中文版)新鲜出炉~</title><link>http://www.cppblog.com/minidxer/archive/2007/09/03/31414.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Sun, 02 Sep 2007 23:25:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2007/09/03/31414.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/31414.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2007/09/03/31414.html#Feedback</comments><slash:comments>5</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/31414.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/31414.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: （下载文件已修正） 来自:Minidx全文检索(http://minidx.com)<br>Minidx机能概要设计书中文版OK了。纯属产品概要设计/使用说明，没什么技术含量，顺便感谢唐菁睿同学一下<br>作者：丁志刚 翻译：唐菁睿 校对：丁志刚&nbsp;&nbsp;<a href='http://www.cppblog.com/minidxer/archive/2007/09/03/31414.html'>阅读全文</a><img src ="http://www.cppblog.com/minidxer/aggbug/31414.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2007-09-03 07:25 <a href="http://www.cppblog.com/minidxer/archive/2007/09/03/31414.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Minidx机能概要说明书(日文版)可以下载了</title><link>http://www.cppblog.com/minidxer/archive/2007/08/26/30879.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Sun, 26 Aug 2007 12:40:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2007/08/26/30879.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/30879.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2007/08/26/30879.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/30879.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/30879.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 公司内部刚好有学习会需要说明Minidx，借此机会写了这份机能概要<br>来自:Minidx全文检索(http://minidx.com)&nbsp;&nbsp;<a href='http://www.cppblog.com/minidxer/archive/2007/08/26/30879.html'>阅读全文</a><img src ="http://www.cppblog.com/minidxer/aggbug/30879.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2007-08-26 20:40 <a href="http://www.cppblog.com/minidxer/archive/2007/08/26/30879.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>写了一个搜索引擎(C/C++)以及文件管理系统(MFC)</title><link>http://www.cppblog.com/minidxer/archive/2007/07/26/28793.html</link><dc:creator>minidxer</dc:creator><author>minidxer</author><pubDate>Wed, 25 Jul 2007 23:18:00 GMT</pubDate><guid>http://www.cppblog.com/minidxer/archive/2007/07/26/28793.html</guid><wfw:comment>http://www.cppblog.com/minidxer/comments/28793.html</wfw:comment><comments>http://www.cppblog.com/minidxer/archive/2007/07/26/28793.html#Feedback</comments><slash:comments>27</slash:comments><wfw:commentRss>http://www.cppblog.com/minidxer/comments/commentRss/28793.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/minidxer/services/trackbacks/28793.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 差不多写了2年,10万行左右代码,不知道大家用着感觉怎么样&nbsp;&nbsp;<a href='http://www.cppblog.com/minidxer/archive/2007/07/26/28793.html'>阅读全文</a><img src ="http://www.cppblog.com/minidxer/aggbug/28793.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/minidxer/" target="_blank">minidxer</a> 2007-07-26 07:18 <a href="http://www.cppblog.com/minidxer/archive/2007/07/26/28793.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>