C++博客-Robert' Game Programming-文章分类-零散知识

C++博客-Robert' Game Programming-文章分类-零散知识http://www.cppblog.com/roberthu/category/13882.htmlzh-cnSun, 23 May 2010 12:29:40 GMTSun, 23 May 2010 12:29:40 GMT60内存对齐http://www.cppblog.com/roberthu/articles/116175.htmlRobert.HuRobert.HuSun, 23 May 2010 08:59:00 GMThttp://www.cppblog.com/roberthu/articles/116175.htmlhttp://www.cppblog.com/roberthu/comments/116175.htmlhttp://www.cppblog.com/roberthu/articles/116175.html#Feedback0http://www.cppblog.com/roberthu/comments/commentRss/116175.htmlhttp://www.cppblog.com/roberthu/services/trackbacks/116175.html

Robert.Hu 2010-05-23 16:59 发表评论

]]>Unicode Ansi Utf-8 编码方式 http://www.cppblog.com/roberthu/articles/116173.htmlRobert.HuRobert.HuSun, 23 May 2010 08:57:00 GMThttp://www.cppblog.com/roberthu/articles/116173.htmlhttp://www.cppblog.com/roberthu/comments/116173.htmlhttp://www.cppblog.com/roberthu/articles/116173.html#Feedback1http://www.cppblog.com/roberthu/comments/commentRss/116173.htmlhttp://www.cppblog.com/roberthu/services/trackbacks/116173.html 工作中碰到这些问题，不太了解，问了一下，被BS了，今天看了些资料，大致总结如下：

Unicode: 用二个字节表示世界各国语言的字符，中文当然也包括在内，占两个字节，英文也一律占两上字节。所以用Unicode保存英文可能会比较浪费空间。由于Unicode占两个字节，比如一个汉字用unicode表示为594E，这个时候，就有字节序的问题了，到底是用big Ending写成594E呢，还是用little Ending写成4E59呢？

UTF-8: 也是国际通用的一种表示方法，它的单位是一个字节

0000 - 007F 0xxxxxxx 用一个字节就可以表示的字符

0080 - 07FF 110xxxxx 10xxxxxx 用二个字节保存的字符
0800 - FFFF 1110xxxx 10xxxxxx 10xxxxxx 需要用三个字节保存的字符

当读这些二进制流的时候，就容易辩认出它所代表的字符。汉字是用三个字节来保存。

ANSI和GBK: ANSI英文用一个字节表示，中文的话用二个字节表示,区分中文编码的方法是高字节的最高位不为0。

不同的国家和地区制定了不同的标准，由此产生了 GB2312, BIG5, JIS 等各自的编码标准。这些使用 2 个字节来代表一个字符的各种汉字延伸编码方式，称为 ANSI 编码。在简体中文系统下，ANSI 编码代表 GB2312 编码，在日文操作系统下，ANSI 编码代表 JIS 编码

Robert.Hu 2010-05-23 16:57 发表评论

]]>