﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-山寨科技-随笔分类-工具平台</title><link>http://www.cppblog.com/Chipset/category/9263.html</link><description /><language>zh-cn</language><lastBuildDate>Thu, 25 Nov 2021 21:50:03 GMT</lastBuildDate><pubDate>Thu, 25 Nov 2021 21:50:03 GMT</pubDate><ttl>60</ttl><item><title>A Tool to Compare Two PDF Files</title><link>http://www.cppblog.com/Chipset/archive/2021/10/15/217487.html</link><dc:creator>Chipset</dc:creator><author>Chipset</author><pubDate>Fri, 15 Oct 2021 02:39:00 GMT</pubDate><guid>http://www.cppblog.com/Chipset/archive/2021/10/15/217487.html</guid><wfw:comment>http://www.cppblog.com/Chipset/comments/217487.html</wfw:comment><comments>http://www.cppblog.com/Chipset/archive/2021/10/15/217487.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/Chipset/comments/commentRss/217487.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/Chipset/services/trackbacks/217487.html</trackback:ping><description><![CDATA[<span style="font-size: 12pt;">杂事太多，一直忙来忙去没时间写点正经的东西。最近写了个工具，用于比较两个PDF文件的不同，把差异点标出来。</span><br /><span style="font-size: 12pt;">后端用Poppler解析PDF文件，然后比较，生成三个html，可用Chrome或Firefox(Edge和IE就悲剧了)打开。<br />查看不同点。</span><span style="font-size: 12pt;">这Poppler虽说免费开源，但Bug无数，真心难用。唉，看在不花钱的份上，忍了...<br /><br /></span><span style="font-size: 12pt;">源代码量3万行的样子，运行依赖33个dll，算法有点复杂，虽然距离我的意愿依旧遥远，但总算见到雏形了。</span><br /><br />下面的截图是简单的比较了两个PDF文件的结果，鼠标点击最右栏就能标出来且能自动对齐变更点，添加或者删除的能对应原始位置。<br /><br /><div><img src="http://www.cppblog.com/images/cppblog_com/chipset/1.png" width="800" height="420" alt="" /><br /><br />下面是个文字变更的逻辑对齐部分截图。文字属性有变更的地方鼠标放上去能弹出不同之处，黑底绿字。<br /><br /><img src="http://www.cppblog.com/images/cppblog_com/chipset/2.png" width="800" height="419" alt="" longdesc="text.html" /><br /><br />下面是单独比较光栅图片的截图，鼠标点击最右栏能自动对齐，添加或删除的则能对应到原始位置，图片变更的会标出所有变更点。<br /><br /><img src="http://www.cppblog.com/images/cppblog_com/chipset/3.png" width="800" height="422" alt="" /></div><span style="font-size: 12pt;"><br />大致就如同前面三个截图的样子(目前缺矢量图的比较)，应该能满足一般的PDF文档比较需求了。<br /></span><span style="font-size: 12pt;"><br /></span><span style="font-size: 12pt;">诸位过客可以下载一个简单用例到</span><span style="font-size: 12pt;">电</span><span style="font-size: 12pt;">脑</span><span style="font-size: 12pt;">上，解压</span><span style="font-size: 16px;">[可以考虑<a href="https://www.7-zip.org/">7z</a>]</span><span style="font-size: 16px;">后</span><span style="font-size: 16px;">，</span><span style="font-size: 12pt;">用</span><span style="font-size: 12pt;">Chrome(或者Firefox，别用Edge和IE)<br />浏览器打开看看效果。</span><span style="font-size: 12pt;">在这里下载</span><span style="font-size: 12pt;"><a href="/Files/Chipset/test.zip" style="font-weight: bold; font-size: 12pt;">/Files/Chipset/test.zip</a></span><span style="font-size: 12pt;"><br /></span><br /><span style="font-size: 12pt;">以上附件很小，麻雀虽小五脏俱全，该有的东西基本都有，可以比较的单个PDF文件1000页以上，且速度很快。<br />以上展示内容用的文字后贴背景页面格式，将来</span><span style="font-size: 16px;">[升级中]解析PDF</span><span style="font-size: 12pt;">用BSD协议的pdfium替换GPL协议的Poppler，<br /></span><span style="font-size: 16px;">页面展示</span><span style="font-size: 16px;">借助PDF.js，但</span><span style="font-size: 12pt;">不再用文字和贴背景这种办法，再接下来做个单机界面，不再用浏览器，算是升级方向吧。<br /><br /></span><span style="font-size: 12pt;">关于比较工具，下面多说几句。<br />做内容比较的工具</span><a href="https://draftable.com/compare"><span style="font-size: 12pt;">Draftable</span></a><span style="font-size: 12pt;">很成功</span><span style="font-size: 12pt;">，用C#写的，可以比较Office和PDF文件，各格式任意混比，只能比文字。<br />我的只能比PDF文件，比较内容是文字和光栅图，有个Kiwi free pdf comparer类似，只不过需要Java运行环境。<br />此外，ABBY的Fine Reader和Adobe的Acrobat也都有比较功</span><span style="font-size: 16px;">能，只不过精度和速度都明显弱一些。</span><span style="font-size: 12pt;">至于视觉<br />比较的工具，由于跟位置有关，大多数场合几乎等同于废物，这种网上一搜一大把，很多免费。纯文本内容的比较<br />工具就更多了，网上一搜一大堆，这里就不必啰嗦了。<br /></span><img src ="http://www.cppblog.com/Chipset/aggbug/217487.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/Chipset/" target="_blank">Chipset</a> 2021-10-15 10:39 <a href="http://www.cppblog.com/Chipset/archive/2021/10/15/217487.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Windows下编译和安装poppler</title><link>http://www.cppblog.com/Chipset/archive/2021/10/15/217526.html</link><dc:creator>Chipset</dc:creator><author>Chipset</author><pubDate>Fri, 15 Oct 2021 02:36:00 GMT</pubDate><guid>http://www.cppblog.com/Chipset/archive/2021/10/15/217526.html</guid><wfw:comment>http://www.cppblog.com/Chipset/comments/217526.html</wfw:comment><comments>http://www.cppblog.com/Chipset/archive/2021/10/15/217526.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/Chipset/comments/commentRss/217526.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/Chipset/services/trackbacks/217526.html</trackback:ping><description><![CDATA[<p><span style="font-family: Arial; font-size: 12pt;">Linux上编译poppler很容易，这里懒得罗嗦。Windows上编译poppler的方法在网上一搜一大堆，实测确实遇到过一个成功的(vcpkg)，但是编译出来</span><span style="font-family: Arial; font-size: 12pt;">的Rel版</span><span style="font-family: Arial; font-size: 12pt;">二进制很大，而且不能编译成QT和GLIB版本。</span><span style="font-family: Arial; font-size: 12pt;">这里介绍另一种方法，供后来者参考，希望能节约你一些时间。</span></p><p><span style="font-family: Arial; font-size: 12pt;"><br /></span><span style="font-size: 12pt; font-family: Arial;">实测E文Windows，Edition: Windows 10 Pro, Version: 2H2 Build: 19042.572 上OK。</span><span style="font-size: 12pt; font-family: Arial;">实测虚拟机上Win7 Ultimate Sp1&nbsp;</span><span style="font-size: 16px; font-family: Arial;">64位</span><span style="font-size: 12pt; font-family: Arial;">上也OK。</span><span style="font-size: 12pt; font-family: Arial;">实测虚拟机上Windows XP Pro, Sp2 32位上编译能成功，但结果不可用。如果你仍在用Windows XP，没有必要继续浏览此文。</span><span style="font-size: 12pt;"></span></p><p><span style="font-family:DengXian;"><br /></span><span style="font-family: Arial; font-size: 12pt;">免费的</span><span style="font-size: 12pt; font-family: Arial;">SDK</span><span style="font-family: Arial; font-size: 12pt;">解析</span><span style="font-size: 12pt; font-family: Arial;">PDF</span><span style="font-family: Arial; font-size: 12pt;">文件可选的有</span><span style="font-size: 12pt; font-family: Arial;">Poppler, MuPDF, PDFium</span><span style="font-family: Arial; font-size: 12pt;">等，三者中最后一个</span><span style="font-size: 12pt; font-family: Arial;">Apache</span><span style="font-family: Arial; font-size: 12pt;">协议，最宽松，</span><span style="font-size: 12pt; font-family: Arial;">Bug</span><span style="font-family: Arial; font-size: 12pt;">也最少，是最佳之选，只可惜在土鳖国下载谷歌的东西很困难，</span><span style="font-size: 12pt; font-family: Arial;">github</span><span style="font-family: Arial; font-size: 12pt;">上有编译好的二进制，只不过内嵌签名，如果一般性质的使用，是个很好的选择[<a href="https://github.com/bblanchon/pdfium-binaries">https://github.com/bblanchon/pdfium-binaries</a>]。</span><span style="font-size: 12pt; font-family: Arial;">MuPDF</span><span style="font-family: Arial; font-size: 12pt;">要么购买要么</span><span style="font-size: 12pt; font-family: Arial;">AGPL</span><span style="font-family: Arial; font-size: 12pt;">协议，对于多数人不可用 [<a href="https://mupdf.com/downloads/index.html">https://mupdf.com/downloads/index.html</a>]。最后只剩下</span><span style="font-size: 12pt; font-family: Arial;">Poppler</span><span style="font-family: Arial; font-size: 12pt;">可选，尽管</span><span style="font-size: 12pt; font-family: Arial;">GPL</span><span style="font-family: Arial; font-size: 12pt;">协议，而且获取容易，进程间通信也可以商用。当然，如果学生练手，不用关心版权，任何一个都可用。<br /><br /></span></p>  <p><span style="font-family: Arial; font-size: 12pt;">安装</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">最简单的办法就是通过</span><span style="font-size: 12pt; font-family: Arial;">msys2(</span><span style="font-family: Arial; font-size: 12pt;">可能</span><span style="font-size: 12pt; font-family: Arial;">cygwin</span><span style="font-family: Arial; font-size: 12pt;">也行</span><span style="font-size: 12pt; font-family: Arial;">)</span><span style="font-family: Arial; font-size: 12pt;">，先把</span><span style="font-size: 12pt; font-family: Arial;">msys2 [</span><span style="font-family: Arial; font-size: 12pt;">选</span><span style="font-size: 12pt; font-family: Arial;">64</span><span style="font-family: Arial; font-size: 12pt;">位</span><span style="font-size: 12pt; font-family: Arial;">] </span><span style="font-family: Arial; font-size: 12pt;">装上，再把开发工具系列以及</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">的依赖</span><span style="font-size: 12pt; font-family: Arial;">(cairo, iconv, glib, boost, lcms2, zlib, png, tiff, jpeg</span><span style="font-family: Arial; font-size: 12pt;">等等很多</span><span style="font-size: 12pt; font-family: Arial;">)</span><span style="font-family: Arial; font-size: 12pt;">都装上，如果用</span><span style="font-size: 12pt; font-family: Arial;">QT</span><span style="font-family: Arial; font-size: 12pt;">也得装</span><span style="font-size: 12pt; font-family: Arial;">QT</span><span style="font-family: Arial; font-size: 12pt;">，用</span><span style="font-size: 12pt; font-family: Arial;">GTK</span><span style="font-family: Arial; font-size: 12pt;">就得装</span><span style="font-size: 12pt; font-family: Arial;">GTK</span><span style="font-family: Arial; font-size: 12pt;">。其实</span><span style="font-size: 12pt; font-family: Arial;">msys2</span><span style="font-family: Arial; font-size: 12pt;">里面就有</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">了</span><span style="font-size: 12pt; font-family: Arial;">(mingw</span><span style="font-family: Arial; font-size: 12pt;">编译的版本</span><span style="font-size: 12pt; font-family: Arial;">)</span><span style="font-family: Arial; font-size: 12pt;">，装上就行。但是如果需要特定环境使用，比如</span><span style="font-size: 12pt; font-family: Arial;">Visual Studio</span><span style="font-family: Arial; font-size: 12pt;">，或者觉得msys2里的poppler版本不好用[我测试觉得它不太好用]，那就得自己编译</span><span style="font-family: Arial; font-size: 12pt;">。安装</span><span style="font-size: 12pt; font-family: Arial;">msys2</span><span style="font-family: Arial; font-size: 12pt;">和开发环境的方法网上一搜一大堆，这里就不罗嗦了。<br /><br /></span><span style="font-family: Arial; font-size: 12pt;">固然不排除有人勇气可嘉，想独立编译</span><span style="font-size: 12pt; font-family: Arial;">poppler试试</span><span style="font-family: Arial; font-size: 12pt;">，我不是在此泼冷水，</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">依赖几十个</span><span style="font-size: 12pt; font-family: Arial;">dll</span><span style="font-family: Arial; font-size: 12pt;">，把这些依赖找全自己独立编译的想法根本不现实，因为编译依赖时还需要一大堆别的依赖，由于网络原因很难都顺利下载下来，即使都下载下来，编译方法各式各样，有些用常规方法根本就编译不过。我曾经编译过Poppler 0.59版就用的这个套路，最后不得不修改了很多源代码才编译成功，得到一个可用的版本。想到编译的艰辛过程，poppler 0.59版我用了一年没升级...<br /><br /></span><span style="font-family: Arial; font-size: 12pt;">装完了msys2把编译器MinGW和make(可能是mingw32-make)加到PATH路径里，一般是***/mingw64/bin<br /><br /></span></p>  <p><span style="font-family: Arial; font-size: 12pt;">接下来把</span><span style="font-size: 12pt; font-family: Arial;">CMake</span><span style="font-family: Arial; font-size: 12pt;">装上，可在这里下载</span><a href="https://cmake.org/files/"><span style="font-size: 12pt; font-family: Arial;">https://cmake.org/files/</span></a><span style="font-family: DengXian; font-size: 12pt;"><br /><br /></span></p>  <p><span style="font-family: Arial; font-size: 12pt;">把</span><span style="font-size: 12pt; font-family: Arial;">Poppler</span><span style="font-family: Arial; font-size: 12pt;">下载下来，地址</span><span style="font-size: 12pt;"><a href="https://poppler.freedesktop.org/"><span style="font-family: Arial; font-size: 12pt;">https://poppler.freedesktop.org/</span></a>&nbsp;</span><span style="font-family: Arial; font-size: 12pt;">注意别忘了字体，因此</span><span style="font-size: 12pt; font-family: Arial;">poppler-data</span><span style="font-family: Arial; font-size: 12pt;">也需要，在安装完</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">再装它<br /><br /></span></p>  <p><span style="font-family: Arial; font-size: 12pt;">编译</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">可用</span><span style="font-size: 12pt; font-family: Arial;">Visual Studio</span><span style="font-family: Arial; font-size: 12pt;">或者</span><span style="font-size: 12pt; font-family: Arial;">mingw[</span><span style="font-family: Arial; font-size: 12pt;">前面</span><span style="font-size: 12pt; font-family: Arial;">msys2</span><span style="font-family: Arial; font-size: 12pt;">里装了</span><span style="font-size: 12pt; font-family: Arial;">]</span><span style="font-family: Arial; font-size: 12pt;">，前者需要</span><span style="font-size: 12pt; font-family: Arial;">vcpkg</span><span style="font-family: Arial; font-size: 12pt;">用</span><span style="font-size: 12pt; font-family: Arial;">power shell</span><span style="font-family: Arial; font-size: 12pt;">命令编译，由于有一堆依赖的库，</span><span style="font-size: 12pt; font-family: Arial;">vcpkg</span><span style="font-family: Arial; font-size: 12pt;">编译前需要逐个下载那些依赖再编译，这个套路理论上没问题，但实际操作起来很难成功，这跟前面想独立编译</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">的想法如出一辙，因为有些依赖的库太难下载</span><span style="font-size: 12pt; font-family: Arial;">[</span><span style="font-family: Arial; font-size: 12pt;">也许因为我这里网络太糟</span><span style="font-size: 12pt; font-family: Arial;">?]。配合msys2安装依赖，用mingw编译poppler要容易的太多</span><span style="font-family: Arial; font-size: 12pt;">。<br /><br /></span></p>  <p><span style="font-family: Arial; font-size: 12pt;">较新的poppler只能编译64位版本可用，如果想编译</span><span style="font-size: 12pt; font-family: Arial;">32</span><span style="font-family: Arial; font-size: 12pt;">位</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">也能成功，但编译完没法在32位系统上用，因为依赖的库有些在</span><span style="font-size: 12pt; font-family: Arial;">32</span><span style="font-family: Arial; font-size: 12pt;">位系统下无法运行，即使在</span><span style="font-size: 12pt; font-family: Arial;">32</span><span style="font-family: Arial; font-size: 12pt;">位系统上编译出</span><span style="font-size: 12pt; font-family: Arial;">32</span><span style="font-family: Arial; font-size: 12pt;">位的</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">也用不了，我在虚拟机上测试过多次，这样看来，曾经流行但老旧的</span><span style="font-size: 12pt; font-family: Arial;">32</span><span style="font-family: Arial; font-size: 12pt;">位</span><span style="font-size: 12pt; font-family: Arial;">Windows Xp</span><span style="font-family: Arial; font-size: 12pt;"> 上没法用</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">了，好在这年头用Xp的人很少。<br /><br /></span></p>  <p><span style="font-family: Arial; font-size: 12pt;">下载</span><span style="font-size: 12pt; font-family: Arial;">poppler</span><span style="font-family: Arial; font-size: 12pt;">源码后解压</span><span style="font-size: 12pt; font-family: Arial;">(</span><span style="font-family: Arial; font-size: 12pt;">可用</span><span style="font-size: 12pt; font-family: Arial;">7-Zip)</span><span style="font-family: Arial; font-size: 12pt;">，然后打开</span><span style="font-size: 12pt; font-family: Arial;">cmake-gui</span><span style="font-family: Arial; font-size: 12pt;">选择源代码路径和存放的目标文件夹编译就行了。如果希望出现</span><span style="font-size: 12pt; font-family: Arial;">.lib</span><span style="font-family: Arial; font-size: 12pt;">别忘了勾选CMAKE_</span><span style="font-size: 12pt; font-family: Arial;">GNUtoMS</span><span style="font-family: Arial; font-size: 12pt;">选项，编译最好指定</span><span style="font-size: 12pt; font-family: Arial;">Release</span><span style="font-family: Arial; font-size: 12pt;">版，否则默认编译的</span><span style="font-size: 12pt; font-family: Arial;">Debug</span><span style="font-family: Arial; font-size: 12pt;">版体积巨大，安装路径里不能有空格，因为安装时需要</span><span style="font-size: 12pt; font-family: Arial;">ming32-make[有人见过mingw64-make吗？]</span><span style="font-family: Arial; font-size: 12pt;">，它不支持路径里有空格。如果硬件不是骨灰级，编译时可开启多任务，我用的电脑CPU是</span><span style="font-size: 12pt; font-family: Arial;">i7-8700，</span><span style="font-family: Arial; font-size: 12pt;">直接命令行</span><span style="font-size: 12pt; font-family: Arial;">mingw32-make &#8211;j12编译需要几分钟的样子</span><br /><br /><span style="font-size: 12pt; font-family: Arial;">下面不要选Visual Studio ***，应该选MinGW Makefiles</span><br /><img src="http://www.cppblog.com/images/cppblog_com/chipset/poppler1.png" alt="" width="755" height="620" /><br /><span style="font-size: 12pt;"><br />需要哪些选项勾选上，Configure失败的回到前面msys2里装上再重来，这个过程可能需要往复多次。<br />编译前最好指定Release版，如果不用手工指定，会默认编译Debug版，那样编译出的dll体积巨大！</span></p>  <p>&nbsp;<img src="http://www.cppblog.com/images/cppblog_com/chipset/poppler2.png" alt="" width="751" height="847" /><br /><img src="http://www.cppblog.com/images/cppblog_com/chipset/poppler3.png" alt="" width="758" height="848" /><br /><br /><span style="font-size: 12pt;">如果前面勾选了CMAKE_GNUtoMS，编译中还会用到Visual Studio，而且编译器版本不能太低。</span><br /><img src="http://www.cppblog.com/images/cppblog_com/chipset/poppler4.png" alt="" /><br /><br /><span style="font-size: 12pt;">编译完用命令装mingw32-make install，会装到前面指定的路径c:/poppler/vc下，接下来再安装poppler-data<br />套路一样，由于它不需要依赖，要容易很多。</span><br /></p><p><span style="font-size: 12pt;"><br />由于我不用Visual Studio，因此CMAKE_GNUtoMS勾选后编译的结果在Visual Studio下是否能用不知道，不过<br />不勾选该项时用MinGW编译的版本是没有任何问题的，我用poppler解析PDF文件做比对工具就用上述方法编译的。<br /></span></p><p><a href="http://www.cppblog.com/Chipset/archive/2020/10/23/217487.html">http://www.cppblog.com/Chipset/archive/2020/10/23/217487.html</a></p><img src ="http://www.cppblog.com/Chipset/aggbug/217526.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/Chipset/" target="_blank">Chipset</a> 2021-10-15 10:36 <a href="http://www.cppblog.com/Chipset/archive/2021/10/15/217526.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>