﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-Shihira-随笔分类-跨平台编程</title><link>http://www.cppblog.com/Shihira/category/20514.html</link><description>Open source - 开放源代码 - 開放原始碼 - オープンソース - 오픈 소스 - Отворен код - متن‌باز</description><language>zh-cn</language><lastBuildDate>Sat, 16 Aug 2014 13:14:54 GMT</lastBuildDate><pubDate>Sat, 16 Aug 2014 13:14:54 GMT</pubDate><ttl>60</ttl><item><title>TypeGame: 为Vim加上打字练习功能</title><link>http://www.cppblog.com/Shihira/archive/2014/08/16/typegame-vim.html</link><dc:creator>Shihira</dc:creator><author>Shihira</author><pubDate>Sat, 16 Aug 2014 13:11:00 GMT</pubDate><guid>http://www.cppblog.com/Shihira/archive/2014/08/16/typegame-vim.html</guid><wfw:comment>http://www.cppblog.com/Shihira/comments/208039.html</wfw:comment><comments>http://www.cppblog.com/Shihira/archive/2014/08/16/typegame-vim.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/Shihira/comments/commentRss/208039.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/Shihira/services/trackbacks/208039.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 这是本文作者所写的一个小小的Vim插件，小到都不必为它新开一个Github项目。如果曾经玩过金山打字的朋友肯定对这个不陌生，无非两行，上行对照，下行打字。这个Vim脚本模拟了这个方式，也因此它即使名为Game，其实挺无趣的囧rz=3&nbsp;&nbsp;<a href='http://www.cppblog.com/Shihira/archive/2014/08/16/typegame-vim.html'>阅读全文</a><img src ="http://www.cppblog.com/Shihira/aggbug/208039.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/Shihira/" target="_blank">Shihira</a> 2014-08-16 21:11 <a href="http://www.cppblog.com/Shihira/archive/2014/08/16/typegame-vim.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Windows API 字符编码转换以及一些解释和心得</title><link>http://www.cppblog.com/Shihira/archive/2013/10/28/200124.html</link><dc:creator>Shihira</dc:creator><author>Shihira</author><pubDate>Mon, 28 Oct 2013 14:49:00 GMT</pubDate><guid>http://www.cppblog.com/Shihira/archive/2013/10/28/200124.html</guid><wfw:comment>http://www.cppblog.com/Shihira/comments/200124.html</wfw:comment><comments>http://www.cppblog.com/Shihira/archive/2013/10/28/200124.html#Feedback</comments><slash:comments>8</slash:comments><wfw:commentRss>http://www.cppblog.com/Shihira/comments/commentRss/200124.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/Shihira/services/trackbacks/200124.html</trackback:ping><description><![CDATA[<div>
<p><span style="font-size: 12pt;">我在解决乱码上面实际走了不少弯路，做了很多实验，查了很多资料。在这里做下笔记，希望后来者可以明白，少走些弯路。</span></p>
<p>
</p>
<h3>从最熟悉的两种字符编码说起<br />
</h3>
<p>
<span style="font-size: 12pt;">除了一些旧的、没有考虑到兼容性的网页还在用gbk做编码外，大部分的网页都已经用utf-8做编码了。但是最令人头疼的是，windows的控制台是很不好显示utf-8的。有明君为我大C++写了两个函数，是正确的、好用的<strike>（除了用std::string做返回值让我等效率党有点觉得不爽之外&#8230;&#8230;还是挺方便的）</strike>.</span></p>
</div>
<div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
--><span style="color: #000000; ">#include&nbsp;</span><span style="color: #000000; ">&lt;</span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">&gt;</span><span style="color: #000000; "><br />
#include&nbsp;</span><span style="color: #000000; ">&lt;</span><span style="color: #000000; ">windows.h</span><span style="color: #000000; ">&gt;</span><span style="color: #000000; "><br />
</span><span style="color: #0000FF; ">using</span><span style="color: #000000; ">&nbsp;std::</span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">;<br />
<br />
</span><span style="color: #008000; ">//</span><span style="color: #008000; ">gbk&nbsp;转&nbsp;utf8</span><span style="color: #008000; "><br />
</span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">&nbsp;GBKToUTF8(</span><span style="color: #0000FF; ">const</span><span style="color: #000000; "> </span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">&amp;</span><span style="color: #000000; ">&nbsp;strGBK)<br />
{<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">&nbsp;strOutUTF8&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">""</span><span style="color: #000000; ">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;WCHAR&nbsp;</span><span style="color: #000000; ">*</span><span style="color: #000000; ">&nbsp;str1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;n&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;MultiByteToWideChar(CP_ACP,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;strGBK.c_str(),&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;NULL,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">);<br />
&nbsp;&nbsp;&nbsp;&nbsp;str1&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">new</span><span style="color: #000000; ">&nbsp;WCHAR[n];<br />
&nbsp;&nbsp;&nbsp;&nbsp;MultiByteToWideChar(CP_ACP,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;strGBK.c_str(),&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;str1,&nbsp;n);<br />
&nbsp;&nbsp;&nbsp;&nbsp;n&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;WideCharToMultiByte(CP_UTF8,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;str1,&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;NULL,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;NULL,&nbsp;NULL);<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">char</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">*</span><span style="color: #000000; ">&nbsp;str2&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">new</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">char</span><span style="color: #000000; ">[n];<br />
&nbsp;&nbsp;&nbsp;&nbsp;WideCharToMultiByte(CP_UTF8,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;str1,&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;str2,&nbsp;n,&nbsp;NULL,&nbsp;NULL);<br />
&nbsp;&nbsp;&nbsp;&nbsp;strOutUTF8&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;str2;<br />
&nbsp;&nbsp;&nbsp;&nbsp;delete[]str1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;str1&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;NULL;<br />
&nbsp;&nbsp;&nbsp;&nbsp;delete[]str2;<br />
&nbsp;&nbsp;&nbsp;&nbsp;str2&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;NULL;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">return</span><span style="color: #000000; ">&nbsp;strOutUTF8;<br />
}<br />
<br />
</span><span style="color: #008000; ">//</span><span style="color: #008000; ">utf-8&nbsp;转&nbsp;gbk</span><span style="color: #008000; "><br />
</span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">&nbsp;UTF8ToGBK(</span><span style="color: #0000FF; ">const</span><span style="color: #000000; "> </span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">&amp;</span><span style="color: #000000; ">&nbsp;strUTF8)<br />
{<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;len&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;MultiByteToWideChar(CP_UTF8,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;strUTF8.c_str(),&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;NULL,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">);<br />
&nbsp;&nbsp;&nbsp;&nbsp;unsigned&nbsp;</span><span style="color: #0000FF; ">short</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">*</span><span style="color: #000000; ">&nbsp;wszGBK&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">new</span><span style="color: #000000; ">&nbsp;unsigned&nbsp;</span><span style="color: #0000FF; ">short</span><span style="color: #000000; ">[len&nbsp;</span><span style="color: #000000; ">+</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">1</span><span style="color: #000000; ">];<br />
&nbsp;&nbsp;&nbsp;&nbsp;memset(wszGBK,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;len&nbsp;</span><span style="color: #000000; ">*</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">2</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">+</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">2</span><span style="color: #000000; ">);<br />
&nbsp;&nbsp;&nbsp;&nbsp;MultiByteToWideChar(CP_UTF8,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;(LPCTSTR)strUTF8.c_str(),&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;wszGBK,&nbsp;len);<br />
<br />
&nbsp;&nbsp;&nbsp;&nbsp;len&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;WideCharToMultiByte(CP_ACP,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;wszGBK,&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;NULL,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;NULL,&nbsp;NULL);<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">char</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">*</span><span style="color: #000000; ">szGBK&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">new</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">char</span><span style="color: #000000; ">[len&nbsp;</span><span style="color: #000000; ">+</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">1</span><span style="color: #000000; ">];<br />
&nbsp;&nbsp;&nbsp;&nbsp;memset(szGBK,&nbsp;</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;len&nbsp;</span><span style="color: #000000; ">+</span><span style="color: #000000; ">&nbsp;</span><span style="color: #000000; ">1</span><span style="color: #000000; ">);<br />
&nbsp;&nbsp;&nbsp;&nbsp;WideCharToMultiByte(CP_ACP,</span><span style="color: #000000; ">0</span><span style="color: #000000; ">,&nbsp;wszGBK,&nbsp;</span><span style="color: #000000; ">-</span><span style="color: #000000; ">1</span><span style="color: #000000; ">,&nbsp;szGBK,&nbsp;len,&nbsp;NULL,&nbsp;NULL);<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000; ">//</span><span style="color: #008000; ">strUTF8&nbsp;=&nbsp;szGBK;</span><span style="color: #008000; "><br />
</span><span style="color: #000000; ">&nbsp;&nbsp;&nbsp;&nbsp;std::</span><span style="color: #0000FF; ">string</span><span style="color: #000000; ">&nbsp;strTemp(szGBK);<br />
&nbsp;&nbsp;&nbsp;&nbsp;delete[]szGBK;<br />
&nbsp;&nbsp;&nbsp;&nbsp;delete[]wszGBK;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">return</span><span style="color: #000000; ">&nbsp;strTemp;<br />
}</span></div>
<p>
<span style="font-size: 12pt;">这玩意儿不跨平台，因为它用到了windows api。我之所以把它放到跨平台编程上面来，是因为字符编码这东西只有到跨平台的时候才显得坑爹。</span></p>
<br />
<h3>接着我是不是要介绍那俩函数一下？</h3>
<div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
--><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;MultiByteToWideChar(<br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;UINT&nbsp;CodePage,&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">代码页是Windows下字符编码的叫法，gbk是936，utf-8是65001，CP_ACP是ANSI</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;DWORD&nbsp;dwFlags,&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">选项标志，转换类型，设0就行了</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LPCSTR&nbsp;lpMultiByteStr,&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">多字节字符串</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;cbMultiByte,&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">字符串要处理的长度，如果是-1函数就会处理整个字符串</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
&nbsp;&nbsp;_Out_opt_&nbsp;&nbsp;LPWSTR&nbsp;lpWideCharStr,&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">输出的宽字符串缓存，如果为空就返回需要的宽字符串长度</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;cchWideChar&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">宽字符串缓存的长度，当然如果宽字符串为空，这个设0就可以了</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
);<br />
<br />
</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;WideCharToMultiByte(<br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;UINT&nbsp;CodePage,<br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;DWORD&nbsp;dwFlags,<br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LPCWSTR&nbsp;lpWideCharStr,<br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;cchWideChar,<br />
&nbsp;&nbsp;_Out_opt_&nbsp;&nbsp;LPSTR&nbsp;lpMultiByteStr,<br />
&nbsp;&nbsp;_In_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000FF; ">int</span><span style="color: #000000; ">&nbsp;cbMultiByte,&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">前面的基本与MultiByteToWideChar都相同，就不解释了</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
&nbsp;&nbsp;_In_opt_&nbsp;&nbsp;&nbsp;LPCSTR&nbsp;lpDefaultChar,&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">填0即可</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
&nbsp;&nbsp;_Out_opt_&nbsp;&nbsp;LPBOOL&nbsp;lpUsedDefaultChar&nbsp;</span><span style="color: #008000; ">/*</span><span style="color: #008000; ">填0即可</span><span style="color: #008000; ">*/</span><span style="color: #000000; "><br />
);<br />
</span></div>
<p><span style="font-size: 12pt;">这两个函数分别是将多字节字符串转换为宽字符字符串 和 将宽字符字符串转换为多字节字符串（在此处晕倒的童鞋们我没有对不起你们&#8230;&#8230;是M$那家伙对不起你们）。我早就说过Windows API 的界面不友好，这么多不知道干嘛吗用的参数，全部填0就对了。要是iconv()，它貌似只有4个参数，这才是好的榜样。</span></p>
<p><br />
</p>
<h3>宽字符？多字节？</h3>
<p><span style="font-size: 12pt;">这是Windows给它们起的名字，让人摸不着头脑。</span></p>
<p>
</p>
<p>
</p>
<ul>
     <li>
     <p><span style="font-size: 12pt;">宽字符：就是<strong style="font-size: 12pt;">Unicode</strong>。它雷打不动地用2个字节（0x0000 - 0xFFFF），表示所有我们平常能见到的字符，具体的表格见：<a href="http://unicode-table.com">http://unicode-table.com</a></span></p>
     </li>
</ul>
<p>
</p>
<ul>
     <li>
     <p><span style="font-size: 12pt;">多字节：就是除了Unicode外<strong style="font-size: 12pt;">其他</strong>的。我们熟悉的gbk, utf-8, big5，统统归入多字节。</span></p>
     </li>
</ul>
<p>
</p>
<p>
</p>
<p><span style="font-size: 12pt;">宽字符之所以叫做宽字符，是因为它是一个宽一点的字符。那什么是短字符&#8230;&#8230;就是ascii了，1个字节1个字符绝对够短，而且只能表示256个西欧字符。宽字符呢，是2个字节1个字符。宽一点，但还是可以识别到一个字符是哪里的。而多字节呢，就是它在计算机里表示成多个字节，但是没有办法识别那里到那里是一个字符。</span></p>
<p>
</p>
<p><span style="font-size: 12pt;">我不喜欢这两个函数的命名。如果按照Python的命名，</span><span style="color: #000000; font-size: 12pt;">MultiByteToWideChar 应该叫 decode(解码)，WideCharToMultiByte 应该叫 encode(编码)。</span></p>
<p><br />
</p>
<h3>所以呢？</h3>
<p><span style="font-size: 12pt;">如你所见，多字节无法准确识别字符的长度，处理起来就会很麻烦。而宽字符大多时候虽然比多字节多耗费一点空间，但是处理起来方便。比如正则表达式处理，引擎是基于字符去匹配的，宽字符可以两个字节两个字节跳着匹配，而多字节就会匹配错误。</span></p>
<p>
</p>
<p><span style="font-size: 12pt;">比如有一个词&#8220;</span><span style="color: #008080; font-size: 12pt;">程序</span><span style="font-size: 12pt;">&#8221;=0xB3</span><span style="color: #0000ff; font-size: 12pt;">CCD0</span><span style="font-size: 12pt;">F2(gbk)，我想匹配&#8220;</span><span style="color: #008080; font-size: 12pt;">绦</span><span style="font-size: 12pt;">&#8221;=0x</span><span style="color: #0000ff; font-size: 12pt;">CCD0</span><span style="font-size: 12pt;">(gbk)，正则库会替我把中间那两个字节匹配了。用在C里用wchar_t，C++里用std::wstring，我们可以很准确的，无错误地匹配到我们想要的子串，因为引擎在迭代的时候是逐字（而不是逐字节）进行比较的。</span></p>
<div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
--><span style="color: #008080; ">1</span>&nbsp;<span style="color: #008000;">&gt;&gt;&gt;</span><span style="color: #000000; ">&nbsp;str1&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #800000; ">绦</span><span style="color: #800000; ">"</span><span style="color: #000000; "><br />
</span><span style="color: #008080; ">2</span>&nbsp;<span style="color: #000000; "></span><span style="color: #008000;">&gt;&gt;&gt;</span><span style="color: #000000; ">&nbsp;str2&nbsp;</span><span style="color: #000000; ">=</span><span style="color: #000000; ">&nbsp;</span><span style="color: #800000; ">"</span><span style="color: #800000; ">程序</span><span style="color: #800000; ">"</span><span style="color: #000000; "><br />
</span><span style="color: #008080; ">3</span>&nbsp;<span style="color: #000000; "></span><span style="color: #008000;">&gt;&gt;&gt;</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">print</span><span style="color: #000000; ">&nbsp;re.findall(str1,&nbsp;str2)<br />
</span><span style="color: #008080; ">4</span>&nbsp;<span style="color: #000000; ">[</span><span style="color: #800000; ">'</span><span style="color: #800000; ">\xcc\xd0</span><span style="color: #800000; ">'</span><span style="color: #000000; ">]<br />
</span><span style="color: #008080; ">5</span>&nbsp;<span style="color: #000000; "></span><span style="color: #008000;">&gt;&gt;&gt;</span><span style="color: #000000; ">&nbsp;</span><span style="color: #0000FF; ">print</span><span style="color: #000000; ">&nbsp;re.findall(str1.decode(</span><span style="color: #800000; ">"</span><span style="color: #800000; ">gbk</span><span style="color: #800000; ">"</span><span style="color: #000000; ">),&nbsp;str2.decode(</span><span style="color: #800000; ">"</span><span style="color: #800000; ">gbk</span><span style="color: #800000; ">"</span><span style="color: #000000; ">))<br />
</span><span style="color: #008080; ">6</span>&nbsp;<span style="color: #000000; ">[]</span></div>
<p><strong><span style="font-size: 12pt;">所以在处理字符串的时候，但凡要处理中文，要先把用户给的字符串解码成Unicode。处理完之后显示出来或者保存，再编码成需要的charset。</span></strong></p>
<h4><br style="color: #333333;" />
<span></span></h4>
<h4><span style="color: #333333; font-size: 10pt;"><em>Appendix</em></span></h4>
<p><em style="color: #333333; font-size: 10pt;">在不同的地方用不同的编码：</em></p>
<ul>
     <li><em style="color: #333333; font-size: 10pt;">网络文本（如网页）传输一般用utf-8，因为有少量中文，而大部分是英文。</em></li>
     <li><em style="color: #333333; font-size: 10pt;">在保存为本地文件的时候，应该保存为Unicode，因为本地存储资源丰富，且可以节省时间，实时解码毕竟也是O(N^2)啊。</em></li>
     <li><em style="color: #333333; font-size: 10pt;">显示出来应该用系统的编码，中文Windows为gbk，繁体Windows为Big5，Linux一律为UTF-8。</em></li>
     <li><em style="color: #333333; font-size: 10pt;">源代码里的少量中文串尽量用<em style="color: #ff6600;">"\x????\x????"</em>来表示，如果有大量中文建议用gettext或者资源之类的以外挂的方式读入。</em></li>
     <li><em style="color: #333333; font-size: 10pt;">Qt内部使用Unicode，所以编写Qt应用时显示文字直接传递宽字符串即可。</em></li>
     <li><em style="color: #333333; font-size: 10pt;">NTFS的文件名、路径都是用<strike>GBK</strike>UTF16LE编码的，所以如果Windows下用户输入的是路径就无需解码了。</em></li>
</ul>
<p><br />
</p><img src ="http://www.cppblog.com/Shihira/aggbug/200124.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/Shihira/" target="_blank">Shihira</a> 2013-10-28 22:49 <a href="http://www.cppblog.com/Shihira/archive/2013/10/28/200124.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>