﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-程序描绘人生-随笔分类-转载</title><link>http://www.cppblog.com/humanchao/category/20275.html</link><description>知识改变命运，学习成就未来。</description><language>zh-cn</language><lastBuildDate>Sun, 01 Mar 2015 11:44:55 GMT</lastBuildDate><pubDate>Sun, 01 Mar 2015 11:44:55 GMT</pubDate><ttl>60</ttl><item><title>转: 国标一级和国标二级汉字</title><link>http://www.cppblog.com/humanchao/archive/2015/02/25/209857.html</link><dc:creator>胡满超</dc:creator><author>胡满超</author><pubDate>Wed, 25 Feb 2015 03:16:00 GMT</pubDate><guid>http://www.cppblog.com/humanchao/archive/2015/02/25/209857.html</guid><wfw:comment>http://www.cppblog.com/humanchao/comments/209857.html</wfw:comment><comments>http://www.cppblog.com/humanchao/archive/2015/02/25/209857.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/humanchao/comments/commentRss/209857.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/humanchao/services/trackbacks/209857.html</trackback:ping><description><![CDATA[<div><span style="font-size: 14pt;">一级国标汉字（3755个）</span></div><div><br /><span style="font-size: 14pt;">啊阿埃挨哎唉哀皑癌蔼矮艾碍爱隘鞍氨安俺按暗岸胺案肮昂盎凹敖熬翱袄傲奥懊澳芭捌扒叭吧笆八疤巴拔跋靶把耙坝霸罢爸白柏百摆佰败拜稗斑班搬扳般颁板版扮拌伴瓣半办绊邦帮梆榜膀绑棒磅蚌镑傍谤苞胞包褒剥薄雹保堡饱宝抱报暴豹鲍爆杯碑悲卑北辈背贝钡倍狈备惫焙被奔苯本笨崩绷甭泵蹦迸逼鼻比鄙笔彼碧蓖蔽毕毙毖币庇痹闭敝弊必辟壁臂避陛鞭边编贬扁便变卞辨辩辫遍标彪膘表鳖憋别瘪彬斌濒滨宾摈兵冰柄丙秉饼炳病并玻菠播拨钵波博勃搏铂箔伯帛舶脖膊渤泊驳捕卜哺补埠不布步簿部怖擦猜裁材才财睬踩采彩菜蔡餐参蚕残惭惨灿苍舱仓沧藏操糙槽曹草厕策侧册测层蹭插叉茬茶查碴搽察岔差诧拆柴豺搀掺蝉馋谗缠铲产阐颤昌猖场尝常长偿肠厂敞畅唱倡超抄钞朝嘲潮巢吵炒车扯撤掣彻澈郴臣辰尘晨忱沉陈趁衬撑称城橙成呈乘程惩澄诚承逞骋秤吃痴持匙池迟弛驰耻齿侈尺赤翅斥炽充冲虫崇宠抽酬畴踌稠愁筹仇绸瞅丑臭初出橱厨躇锄雏滁除楚础储矗搐触处揣川穿椽传船喘串疮窗幢床闯创吹炊捶锤垂春椿醇唇淳纯蠢戳绰疵茨磁雌辞慈瓷词此刺赐次聪葱囱匆从丛凑粗醋簇促蹿篡窜摧崔催脆瘁粹淬翠村存寸磋撮搓措挫错搭达答瘩打大呆歹傣戴带殆代贷袋待逮怠耽担丹单郸掸胆旦氮但惮淡诞弹蛋当挡党荡档刀捣蹈倒岛祷导到稻悼道盗德得的蹬灯登等瞪凳邓堤低滴迪敌笛狄涤翟嫡抵底地蒂第帝弟递缔颠掂滇碘点典靛垫电佃甸店惦奠淀殿碉叼雕凋刁掉吊钓调跌爹碟蝶迭谍叠丁盯叮钉顶鼎锭定订丢东冬董懂动栋侗恫冻洞兜抖斗陡豆逗痘都督毒犊独读堵睹赌杜镀肚度渡妒端短锻段断缎堆兑队对墩吨蹲敦顿囤钝盾遁掇哆多夺垛躲朵跺舵剁惰堕蛾峨鹅俄额讹娥恶厄扼遏鄂饿恩而儿耳尔饵洱二贰发罚筏伐乏阀法珐藩帆番翻樊矾钒繁凡烦反返范贩犯饭泛坊芳方肪房防妨仿访纺放菲非啡飞肥匪诽吠肺废沸费芬酚吩氛分纷坟焚汾粉奋份忿愤粪丰封枫蜂峰锋风疯烽逢冯缝讽奉凤佛否夫敷肤孵扶拂辐幅氟符伏俘服浮涪福袱弗甫抚辅俯釜斧脯腑府腐赴副覆赋复傅付阜父腹负富讣附妇缚咐噶嘎该改概钙盖溉干甘杆柑竿肝赶感秆敢赣冈刚钢缸肛纲岗港杠篙皋高膏羔糕搞镐稿告哥歌搁戈鸽胳疙割革葛格蛤阁隔铬个各给根跟耕更庚羹埂耿梗工攻功恭龚供躬公宫弓巩汞拱贡共钩勾沟苟狗垢构购够辜菇咕箍估沽孤姑鼓古蛊骨谷股故顾固雇刮瓜剐寡挂褂乖拐怪棺关官冠观管馆罐惯灌贯光广逛瑰规圭硅归龟闺轨鬼诡癸桂柜跪贵刽辊滚棍锅郭国果裹过哈骸孩海氦亥害骇酣憨邯韩含涵寒函喊罕翰撼捍旱憾悍焊汗汉夯杭航壕嚎豪毫郝好耗号浩呵喝荷菏核禾和何合盒貉阂河涸赫褐鹤贺嘿黑痕很狠恨哼亨横衡恒轰哄烘虹鸿洪宏弘红喉侯猴吼厚候后呼乎忽瑚壶葫胡蝴狐糊湖弧虎唬护互沪户花哗华猾滑画划化话槐徊怀淮坏欢环桓还缓换患唤痪豢焕涣宦幻荒慌黄磺蝗簧皇凰惶煌晃幌恍谎灰挥辉徽恢蛔回毁悔慧卉惠晦贿秽会烩汇讳诲绘荤昏婚魂浑混豁活伙火获或惑霍货祸击圾基机畸稽积箕肌饥迹激讥鸡姬绩缉吉极棘辑籍集及急疾汲即嫉级挤几脊己蓟技冀季伎祭剂悸济寄寂计记既忌际妓继纪嘉枷夹佳家加荚颊贾甲钾假稼价架驾嫁歼监坚尖笺间煎兼肩艰奸缄茧检柬碱硷拣捡简俭剪减荐槛鉴践贱见键箭件健舰剑饯渐溅涧建僵姜将浆江疆蒋桨奖讲匠酱降蕉椒礁焦胶交郊浇骄娇嚼搅铰矫侥脚狡角饺缴绞剿教酵轿较叫窖揭接皆秸街阶截劫节桔杰捷睫竭洁结解姐戒藉芥界借介疥诫届巾筋斤金今津襟紧锦仅谨进靳晋禁近烬浸尽劲荆兢茎睛晶鲸京惊精粳经井警景颈静境敬镜径痉靖竟竞净炯窘揪究纠玖韭久灸九酒厩救旧臼舅咎就疚鞠拘狙疽居驹菊局咀矩举沮聚拒据巨具距踞锯俱句惧炬剧捐鹃娟倦眷卷绢撅攫抉掘倔爵觉决诀绝均菌钧军君峻俊竣浚郡骏喀咖卡咯开揩楷凯慨刊堪勘坎砍看康慷糠扛抗亢炕考拷烤靠坷苛柯棵磕颗科壳咳可渴克刻客课肯啃垦恳坑吭空恐孔控抠口扣寇枯哭窟苦酷库裤夸垮挎跨胯块筷侩快宽款匡筐狂框矿眶旷况亏盔岿窥葵奎魁傀馈愧溃坤昆捆困括扩廓阔垃拉喇蜡腊辣啦莱来赖蓝婪栏拦篮阑兰澜谰揽览懒缆烂滥琅榔狼廊郎朗浪捞劳牢老佬姥酪烙涝勒乐雷镭蕾磊累儡垒擂肋类泪棱楞冷厘梨犁黎篱狸离漓理李里鲤礼莉荔吏栗丽厉励砾历利傈例俐痢立粒沥隶力璃哩俩联莲连镰廉怜涟帘敛脸链恋炼练粮凉梁粱良两辆量晾亮谅撩聊僚疗燎寥辽潦了撂镣廖料列裂烈劣猎琳林磷霖临邻鳞淋凛赁吝拎玲菱零龄铃伶羚凌灵陵岭领另令溜琉榴硫馏留刘瘤流柳六龙聋咙笼窿隆垄拢陇楼娄搂篓漏陋芦卢颅庐炉掳卤虏鲁麓碌露路赂鹿潞禄录陆戮驴吕铝侣旅履屡缕虑氯律率滤绿峦挛孪滦卵乱掠略抡轮伦仑沦纶论萝螺罗逻锣箩骡裸落洛骆络妈麻玛码蚂马骂嘛吗埋买麦卖迈脉瞒馒蛮满蔓曼慢漫谩芒茫盲氓忙莽猫茅锚毛矛铆卯茂冒帽貌贸么玫枚梅酶霉煤没眉媒镁每美昧寐妹媚门闷们萌蒙檬盟锰猛梦孟眯醚靡糜迷谜弥米秘觅泌蜜密幂棉眠绵冕免勉娩缅面苗描瞄藐秒渺庙妙蔑灭民抿皿敏悯闽明螟鸣铭名命谬摸摹蘑模膜磨摩魔抹末莫墨默沫漠寞陌谋牟某拇牡亩姆母墓暮幕募慕木目睦牧穆拿哪呐钠那娜纳氖乃奶耐奈南男难囊挠脑恼闹淖呢馁内嫩能妮霓倪泥尼拟你匿腻逆溺蔫拈年碾撵捻念娘酿鸟尿捏聂孽啮镊镍涅您柠狞凝宁拧泞牛扭钮纽脓浓农弄奴努怒女暖虐疟挪懦糯诺哦欧鸥殴藕呕偶沤啪趴爬帕怕琶拍排牌徘湃派攀潘盘磐盼畔判叛乓庞旁耪胖抛咆刨炮袍跑泡呸胚培裴赔陪配佩沛喷盆砰抨烹澎彭蓬棚硼篷膨朋鹏捧碰坯砒霹批披劈琵毗啤脾疲皮匹痞僻屁譬篇偏片骗飘漂瓢票撇瞥拼频贫品聘乒坪苹萍平凭瓶评屏坡泼颇婆破魄迫粕剖扑铺仆莆葡菩蒲埔朴圃普浦谱曝瀑期欺栖戚妻七凄漆柒沏其棋奇歧畦崎脐齐旗祈祁骑起岂乞企启契砌器气迄弃汽泣讫掐恰洽牵扦钎铅千迁签仟谦乾黔钱钳前潜遣浅谴堑嵌欠歉枪呛腔羌墙蔷强抢橇锹敲悄桥瞧乔侨巧鞘撬翘峭俏窍切茄且怯窃钦侵亲秦琴勤芹擒禽寝沁青轻氢倾卿清擎晴氰情顷请庆琼穷秋丘邱球求囚酋泅趋区蛆曲躯屈驱渠取娶龋趣去圈颧权醛泉全痊拳犬券劝缺炔瘸却鹊榷确雀裙群然燃冉染瓤壤攘嚷让饶扰绕惹热壬仁人忍韧任认刃妊纫扔仍日戎茸蓉荣融熔溶容绒冗揉柔肉茹蠕儒孺如辱乳汝入褥软阮蕊瑞锐闰润若弱撒洒萨腮鳃塞赛三叁伞散桑嗓丧搔骚扫嫂瑟色涩森僧莎砂杀刹沙纱傻啥煞筛晒珊苫杉山删煽衫闪陕擅赡膳善汕扇缮墒伤商赏晌上尚裳梢捎稍烧芍勺韶少哨邵绍奢赊蛇舌舍赦摄射慑涉社设砷申呻伸身深娠绅神沈审婶甚肾慎渗声生甥牲升绳省盛剩胜圣师失狮施湿诗尸虱十石拾时什食蚀实识史矢使屎驶始式示士世柿事拭誓逝势是嗜噬适仕侍释饰氏市恃室视试收手首守寿授售受瘦兽蔬枢梳殊抒输叔舒淑疏书赎孰熟薯暑曙署蜀黍鼠属术述树束戍竖墅庶数漱恕刷耍摔衰甩帅栓拴霜双爽谁水睡税吮瞬顺舜说硕朔烁斯撕嘶思私司丝死肆寺嗣四伺似饲巳松耸怂颂送宋讼诵搜艘擞嗽苏酥俗素速粟僳塑溯宿诉肃酸蒜算虽隋随绥髓碎岁穗遂隧祟孙损笋蓑梭唆缩琐索锁所塌他它她塔獭挞蹋踏胎苔抬台泰酞太态汰坍摊贪瘫滩坛檀痰潭谭谈坦毯袒碳探叹炭汤塘搪堂棠膛唐糖倘躺淌趟烫掏涛滔绦萄桃逃淘陶讨套特藤腾疼誊梯剔踢锑提题蹄啼体替嚏惕涕剃屉天添填田甜恬舔腆挑条迢眺跳贴铁帖厅听烃汀廷停亭庭挺艇通桐酮瞳同铜彤童桶捅筒统痛偷投头透凸秃突图徒途涂屠土吐兔湍团推颓腿蜕褪退吞屯臀拖托脱鸵陀驮驼椭妥拓唾挖哇蛙洼娃瓦袜歪外豌弯湾玩顽丸烷完碗挽晚皖惋宛婉万腕汪王亡枉网往旺望忘妄威巍微危韦违桅围唯惟为潍维苇萎委伟伪尾纬未蔚味畏胃喂魏位渭谓尉慰卫瘟温蚊文闻纹吻稳紊问嗡翁瓮挝蜗涡窝我斡卧握沃巫呜钨乌污诬屋无芜梧吾吴毋武五捂午舞伍侮坞戊雾晤物勿务悟误昔熙析西硒矽晰嘻吸锡牺稀息希悉膝夕惜熄烯溪汐犀檄袭席习媳喜铣洗系隙戏细瞎虾匣霞辖暇峡侠狭下厦夏吓掀锨先仙鲜纤咸贤衔舷闲涎弦嫌显险现献县腺馅羡宪陷限线相厢镶香箱襄湘乡翔祥详想响享项巷橡像向象萧硝霄削哮嚣销消宵淆晓小孝校肖啸笑效楔些歇蝎鞋协挟携邪斜胁谐写械卸蟹懈泄泻谢屑薪芯锌欣辛新忻心信衅星腥猩惺兴刑型形邢行醒幸杏性姓兄凶胸匈汹雄熊休修羞朽嗅锈秀袖绣墟戌需虚嘘须徐许蓄酗叙旭序畜恤絮婿绪续轩喧宣悬旋玄选癣眩绚靴薛学穴雪血勋熏循旬询寻驯巡殉汛训讯逊迅压押鸦鸭呀丫芽牙蚜崖衙涯雅哑亚讶焉咽阉烟淹盐严研蜒岩延言颜阎炎沿奄掩眼衍演艳堰燕厌砚雁唁彦焰宴谚验殃央鸯秧杨扬佯疡羊洋阳氧仰痒养样漾邀腰妖瑶摇尧遥窑谣姚咬舀药要耀椰噎耶爷野冶也页掖业叶曳腋夜液一壹医揖铱依伊衣颐夷遗移仪胰疑沂宜姨彝椅蚁倚已乙矣以艺抑易邑屹亿役臆逸肄疫亦裔意毅忆义益溢诣议谊译异翼翌绎茵荫因殷音阴姻吟银淫寅饮尹引隐印英樱婴鹰应缨莹萤营荧蝇迎赢盈影颖硬映哟拥佣臃痈庸雍踊蛹咏泳涌永恿勇用幽优悠忧尤由邮铀犹油游酉有友右佑釉诱又幼迂淤于盂榆虞愚舆余俞逾鱼愉渝渔隅予娱雨与屿禹宇语羽玉域芋郁吁遇喻峪御愈欲狱育誉浴寓裕预豫驭鸳渊冤元垣袁原援辕园员圆猿源缘远苑愿怨院曰约越跃钥岳粤月悦阅耘云郧匀陨允运蕴酝晕韵孕匝砸杂栽哉灾宰载再在咱攒暂赞赃脏葬遭糟凿藻枣早澡蚤躁噪造皂灶燥责择则泽贼怎增憎曾赠扎喳渣札轧铡闸眨栅榨咋乍炸诈摘斋宅窄债寨瞻毡詹粘沾盏斩辗崭展蘸栈占战站湛绽樟章彰漳张掌涨杖丈帐账仗胀瘴障招昭找沼赵照罩兆肇召遮折哲蛰辙者锗蔗这浙珍斟真甄砧臻贞针侦枕疹诊震振镇阵蒸挣睁征狰争怔整拯正政帧症郑证芝枝支吱蜘知肢脂汁之织职直植殖执值侄址指止趾只旨纸志挚掷至致置帜峙制智秩稚质炙痔滞治窒中盅忠钟衷终种肿重仲众舟周州洲诌粥轴肘帚咒皱宙昼骤珠株蛛朱猪诸诛逐竹烛煮拄瞩嘱主著柱助蛀贮铸筑住注祝驻抓爪拽专砖转撰赚篆桩庄装妆撞壮状椎锥追赘坠缀谆准捉拙卓桌琢茁酌啄着灼浊兹咨资姿滋淄孜紫仔籽滓子自渍字鬃棕踪宗综总纵邹走奏揍租足卒族祖诅阻组钻纂嘴醉最罪尊遵昨左佐柞做作坐座</span></div><div>&nbsp;</div><div><span style="font-size: 14pt;">二级国标汉字（3008个）</span></div><div><br /><span style="font-size: 14pt;">亍丌兀丐廿卅丕亘丞鬲孬噩丨禺丿匕乇夭爻卮氐囟胤馗毓睾鼗丶亟鼐乜乩亓芈孛啬嘏仄厍厝厣厥厮靥赝匚叵匦匮匾赜卦卣刂刈刎刭刳刿剀剌剞剡剜蒯剽劂劁劐劓冂罔亻仃仉仂仨仡仫仞伛仳伢佤仵伥伧伉伫佞佧攸佚佝佟佗伲伽佶佴侑侉侃侏佾佻侪佼侬侔俦俨俪俅俚俣俜俑俟俸倩偌俳倬倏倮倭俾倜倌倥倨偾偃偕偈偎偬偻傥傧傩傺僖儆僭僬僦僮儇儋仝氽佘佥俎龠汆籴兮巽黉馘冁夔勹匍訇匐凫夙兕亠兖亳衮袤亵脔裒禀嬴蠃羸冫冱冽冼凇冖冢冥讠讦讧讪讴讵讷诂诃诋诏诎诒诓诔诖诘诙诜诟诠诤诨诩诮诰诳诶诹诼诿谀谂谄谇谌谏谑谒谔谕谖谙谛谘谝谟谠谡谥谧谪谫谮谯谲谳谵谶卩卺阝阢阡阱阪阽阼陂陉陔陟陧陬陲陴隈隍隗隰邗邛邝邙邬邡邴邳邶邺邸邰郏郅邾郐郄郇郓郦郢郜郗郛郫郯郾鄄鄢鄞鄣鄱鄯鄹酃酆刍奂劢劬劭劾哿勐勖勰叟燮矍廴凵凼鬯厶弁畚巯坌垩垡塾墼壅壑圩圬圪圳圹圮圯坜圻坂坩垅坫垆坼坻坨坭坶坳垭垤垌垲埏垧垴垓垠埕埘埚埙埒垸埴埯埸埤埝堋堍埽埭堀堞堙塄堠塥塬墁墉墚墀馨鼙懿艹艽艿芏芊芨芄芎芑芗芙芫芸芾芰苈苊苣芘芷芮苋苌苁芩芴芡芪芟苄苎芤苡茉苷苤茏茇苜苴苒苘茌苻苓茑茚茆茔茕苠苕茜荑荛荜茈莒茼茴茱莛荞茯荏荇荃荟荀茗荠茭茺茳荦荥荨茛荩荬荪荭荮莰荸莳莴莠莪莓莜莅荼莶莩荽莸荻莘莞莨莺莼菁萁菥菘堇萘萋菝菽菖萜萸萑萆菔菟萏萃菸菹菪菅菀萦菰菡葜葑葚葙葳蒇蒈葺蒉葸萼葆葩葶蒌蒎萱葭蓁蓍蓐蓦蒽蓓蓊蒿蒺蓠蒡蒹蒴蒗蓥蓣蔌甍蔸蓰蔹蔟蔺蕖蔻蓿蓼蕙蕈蕨蕤蕞蕺瞢蕃蕲蕻薤薨薇薏蕹薮薜薅薹薷薰藓藁藜藿蘧蘅蘩蘖蘼廾弈夼奁耷奕奚奘匏尢尥尬尴扌扪抟抻拊拚拗拮挢拶挹捋捃掭揶捱捺掎掴捭掬掊捩掮掼揲揸揠揿揄揞揎摒揆掾摅摁搋搛搠搌搦搡摞撄摭撖摺撷撸撙撺擀擐擗擤擢攉攥攮弋忒甙弑卟叱叽叩叨叻吒吖吆呋呒呓呔呖呃吡呗呙吣吲咂咔呷呱呤咚咛咄呶呦咝哐咭哂咴哒咧咦哓哔呲咣哕咻咿哌哙哚哜咩咪咤哝哏哞唛哧唠哽唔哳唢唣唏唑唧唪啧喏喵啉啭啁啕唿啐唼唷啖啵啶啷唳唰啜喋嗒喃喱喹喈喁喟啾嗖喑啻嗟喽喾喔喙嗪嗷嗉嘟嗑嗫嗬嗔嗦嗝嗄嗯嗥嗲嗳嗌嗍嗨嗵嗤辔嘞嘈嘌嘁嘤嘣嗾嘀嘧嘭噘嘹噗嘬噍噢噙噜噌噔嚆噤噱噫噻噼嚅嚓嚯囔囗囝囡囵囫囹囿圄圊圉圜帏帙帔帑帱帻帼帷幄幔幛幞幡岌屺岍岐岖岈岘岙岑岚岜岵岢岽岬岫岱岣峁岷峄峒峤峋峥崂崃崧崦崮崤崞崆崛嵘崾崴崽嵬嵛嵯嵝嵫嵋嵊嵩嵴嶂嶙嶝豳嶷巅彳彷徂徇徉後徕徙徜徨徭徵徼衢彡犭犰犴犷犸狃狁狎狍狒狨狯狩狲狴狷猁狳猃狺狻猗猓猡猊猞猝猕猢猹猥猬猸猱獐獍獗獠獬獯獾舛夥飧夤夂饣饧饨饩饪饫饬饴饷饽馀馄馇馊馍馐馑馓馔馕庀庑庋庖庥庠庹庵庾庳赓廒廑廛廨廪膺忄忉忖忏怃忮怄忡忤忾怅怆忪忭忸怙怵怦怛怏怍怩怫怊怿怡恸恹恻恺恂恪恽悖悚悭悝悃悒悌悛惬悻悱惝惘惆惚悴愠愦愕愣惴愀愎愫慊慵憬憔憧憷懔懵忝隳闩闫闱闳闵闶闼闾阃阄阆阈阊阋阌阍阏阒阕阖阗阙阚丬爿戕氵汔汜汊沣沅沐沔沌汨汩汴汶沆沩泐泔沭泷泸泱泗沲泠泖泺泫泮沱泓泯泾洹洧洌浃浈洇洄洙洎洫浍洮洵洚浏浒浔洳涑浯涞涠浞涓涔浜浠浼浣渚淇淅淞渎涿淠渑淦淝淙渖涫渌涮渫湮湎湫溲湟溆湓湔渲渥湄滟溱溘滠漭滢溥溧溽溻溷滗溴滏溏滂溟潢潆潇漤漕滹漯漶潋潴漪漉漩澉澍澌潸潲潼潺濑濉澧澹澶濂濡濮濞濠濯瀚瀣瀛瀹瀵灏灞宀宄宕宓宥宸甯骞搴寤寮褰寰蹇謇辶迓迕迥迮迤迩迦迳迨逅逄逋逦逑逍逖逡逵逶逭逯遄遑遒遐遨遘遢遛暹遴遽邂邈邃邋彐彗彖彘尻咫屐屙孱屣屦羼弪弩弭艴弼鬻屮妁妃妍妩妪妣妗姊妫妞妤姒妲妯姗妾娅娆姝娈姣姘姹娌娉娲娴娑娣娓婀婧婊婕娼婢婵胬媪媛婷婺媾嫫媲嫒嫔媸嫠嫣嫱嫖嫦嫘嫜嬉嬗嬖嬲嬷孀尕尜孚孥孳孑孓孢驵驷驸驺驿驽骀骁骅骈骊骐骒骓骖骘骛骜骝骟骠骢骣骥骧纟纡纣纥纨纩纭纰纾绀绁绂绉绋绌绐绔绗绛绠绡绨绫绮绯绱绲缍绶绺绻绾缁缂缃缇缈缋缌缏缑缒缗缙缜缛缟缡缢缣缤缥缦缧缪缫缬缭缯缰缱缲缳缵幺畿巛甾邕玎玑玮玢玟珏珂珑玷玳珀珉珈珥珙顼琊珩珧珞玺珲琏琪瑛琦琥琨琰琮琬琛琚瑁瑜瑗瑕瑙瑷瑭瑾璜璎璀璁璇璋璞璨璩璐璧瓒璺韪韫韬杌杓杞杈杩枥枇杪杳枘枧杵枨枞枭枋杷杼柰栉柘栊柩枰栌柙枵柚枳柝栀柃枸柢栎柁柽栲栳桠桡桎桢桄桤梃栝桕桦桁桧桀栾桊桉栩梵梏桴桷梓桫棂楮棼椟椠棹椤棰椋椁楗棣椐楱椹楠楂楝榄楫榀榘楸椴槌榇榈槎榉楦楣楹榛榧榻榫榭槔榱槁槊槟榕槠榍槿樯槭樗樘橥槲橄樾檠橐橛樵檎橹樽樨橘橼檑檐檩檗檫猷獒殁殂殇殄殒殓殍殚殛殡殪轫轭轱轲轳轵轶轸轷轹轺轼轾辁辂辄辇辋辍辎辏辘辚軎戋戗戛戟戢戡戥戤戬臧瓯瓴瓿甏甑甓攴旮旯旰昊昙杲昃昕昀炅曷昝昴昱昶昵耆晟晔晁晏晖晡晗晷暄暌暧暝暾曛曜曦曩贲贳贶贻贽赀赅赆赈赉赇赍赕赙觇觊觋觌觎觏觐觑牮犟牝牦牯牾牿犄犋犍犏犒挈挲掰搿擘耄毪毳毽毵毹氅氇氆氍氕氘氙氚氡氩氤氪氲攵敕敫牍牒牖爰虢刖肟肜肓肼朊肽肱肫肭肴肷胧胨胩胪胛胂胄胙胍胗朐胝胫胱胴胭脍脎胲胼朕脒豚脶脞脬脘脲腈腌腓腴腙腚腱腠腩腼腽腭腧塍媵膈膂膑滕膣膪臌朦臊膻臁膦欤欷欹歃歆歙飑飒飓飕飙飚殳彀毂觳斐齑斓於旆旄旃旌旎旒旖炀炜炖炝炻烀炷炫炱烨烊焐焓焖焯焱煳煜煨煅煲煊煸煺熘熳熵熨熠燠燔燧燹爝爨灬焘煦熹戾戽扃扈扉礻祀祆祉祛祜祓祚祢祗祠祯祧祺禅禊禚禧禳忑忐怼恝恚恧恁恙恣悫愆愍慝憩憝懋懑戆肀聿沓泶淼矶矸砀砉砗砘砑斫砭砜砝砹砺砻砟砼砥砬砣砩硎硭硖硗砦硐硇硌硪碛碓碚碇碜碡碣碲碹碥磔磙磉磬磲礅磴礓礤礞礴龛黹黻黼盱眄眍盹眇眈眚眢眙眭眦眵眸睐睑睇睃睚睨睢睥睿瞍睽瞀瞌瞑瞟瞠瞰瞵瞽町畀畎畋畈畛畲畹疃罘罡罟詈罨罴罱罹羁罾盍盥蠲钅钆钇钋钊钌钍钏钐钔钗钕钚钛钜钣钤钫钪钭钬钯钰钲钴钶钷钸钹钺钼钽钿铄铈铉铊铋铌铍铎铐铑铒铕铖铗铙铘铛铞铟铠铢铤铥铧铨铪铩铫铮铯铳铴铵铷铹铼铽铿锃锂锆锇锉锊锍锎锏锒锓锔锕锖锘锛锝锞锟锢锪锫锩锬锱锲锴锶锷锸锼锾锿镂锵镄镅镆镉镌镎镏镒镓镔镖镗镘镙镛镞镟镝镡镢镤镥镦镧镨镩镪镫镬镯镱镲镳锺矧矬雉秕秭秣秫稆嵇稃稂稞稔稹稷穑黏馥穰皈皎皓皙皤瓞瓠甬鸠鸢鸨鸩鸪鸫鸬鸲鸱鸶鸸鸷鸹鸺鸾鹁鹂鹄鹆鹇鹈鹉鹋鹌鹎鹑鹕鹗鹚鹛鹜鹞鹣鹦鹧鹨鹩鹪鹫鹬鹱鹭鹳疒疔疖疠疝疬疣疳疴疸痄疱疰痃痂痖痍痣痨痦痤痫痧瘃痱痼痿瘐瘀瘅瘌瘗瘊瘥瘘瘕瘙瘛瘼瘢瘠癀瘭瘰瘿瘵癃瘾瘳癍癞癔癜癖癫癯翊竦穸穹窀窆窈窕窦窠窬窨窭窳衤衩衲衽衿袂袢裆袷袼裉裢裎裣裥裱褚裼裨裾裰褡褙褓褛褊褴褫褶襁襦襻疋胥皲皴矜耒耔耖耜耠耢耥耦耧耩耨耱耋耵聃聆聍聒聩聱覃顸颀颃颉颌颍颏颔颚颛颞颟颡颢颥颦虍虔虬虮虿虺虼虻蚨蚍蚋蚬蚝蚧蚣蚪蚓蚩蚶蛄蚵蛎蚰蚺蚱蚯蛉蛏蚴蛩蛱蛲蛭蛳蛐蜓蛞蛴蛟蛘蛑蜃蜇蛸蜈蜊蜍蜉蜣蜻蜞蜥蜮蜚蜾蝈蜴蜱蜩蜷蜿螂蜢蝽蝾蝻蝠蝰蝌蝮螋蝓蝣蝼蝤蝙蝥螓螯螨蟒蟆螈螅螭螗螃螫蟥螬螵螳蟋蟓螽蟑蟀蟊蟛蟪蟠蟮蠖蠓蟾蠊蠛蠡蠹蠼缶罂罄罅舐竺竽笈笃笄笕笊笫笏筇笸笪笙笮笱笠笥笤笳笾笞筘筚筅筵筌筝筠筮筻筢筲筱箐箦箧箸箬箝箨箅箪箜箢箫箴篑篁篌篝篚篥篦篪簌篾篼簏簖簋簟簪簦簸籁籀臾舁舂舄臬衄舡舢舣舭舯舨舫舸舻舳舴舾艄艉艋艏艚艟艨衾袅袈裘裟襞羝羟羧羯羰羲籼敉粑粝粜粞粢粲粼粽糁糇糌糍糈糅糗糨艮暨羿翎翕翥翡翦翩翮翳糸絷綦綮繇纛麸麴赳趄趔趑趱赧赭豇豉酊酐酎酏酤酢酡酰酩酯酽酾酲酴酹醌醅醐醍醑醢醣醪醭醮醯醵醴醺豕鹾趸跫踅蹙蹩趵趿趼趺跄跖跗跚跞跎跏跛跆跬跷跸跣跹跻跤踉跽踔踝踟踬踮踣踯踺蹀踹踵踽踱蹉蹁蹂蹑蹒蹊蹰蹶蹼蹯蹴躅躏躔躐躜躞豸貂貊貅貘貔斛觖觞觚觜觥觫觯訾謦靓雩雳雯霆霁霈霏霎霪霭霰霾龀龃龅龆龇龈龉龊龌黾鼋鼍隹隼隽雎雒瞿雠銎銮鋈錾鍪鏊鎏鐾鑫鱿鲂鲅鲆鲇鲈稣鲋鲎鲐鲑鲒鲔鲕鲚鲛鲞鲟鲠鲡鲢鲣鲥鲦鲧鲨鲩鲫鲭鲮鲰鲱鲲鲳鲴鲵鲶鲷鲺鲻鲼鲽鳄鳅鳆鳇鳊鳋鳌鳍鳎鳏鳐鳓鳔鳕鳗鳘鳙鳜鳝鳟鳢靼鞅鞑鞒鞔鞯鞫鞣鞲鞴骱骰骷鹘骶骺骼髁髀髅髂髋髌髑魅魃魇魉魈魍魑飨餍餮饕饔髟髡髦髯髫髻髭髹鬈鬏鬓鬟鬣麽麾縻麂麇麈麋麒鏖麝麟黛黜黝黠黟黢黩黧黥黪黯鼢鼬鼯鼹鼷鼽鼾齄</span></div><img src ="http://www.cppblog.com/humanchao/aggbug/209857.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/humanchao/" target="_blank">胡满超</a> 2015-02-25 11:16 <a href="http://www.cppblog.com/humanchao/archive/2015/02/25/209857.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>转：经典的String Hash算法  </title><link>http://www.cppblog.com/humanchao/archive/2012/12/26/196690.html</link><dc:creator>胡满超</dc:creator><author>胡满超</author><pubDate>Wed, 26 Dec 2012 09:08:00 GMT</pubDate><guid>http://www.cppblog.com/humanchao/archive/2012/12/26/196690.html</guid><wfw:comment>http://www.cppblog.com/humanchao/comments/196690.html</wfw:comment><comments>http://www.cppblog.com/humanchao/archive/2012/12/26/196690.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/humanchao/comments/commentRss/196690.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/humanchao/services/trackbacks/196690.html</trackback:ping><description><![CDATA[<span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">设计高效算法往往需要使用Hash表，O(1)级的查找速度是任何别的算法无法比拟的。</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">所谓Hash，一般是一个整数，通过某种算法，可以把一个字符串"pack"成一个整数，这个数称为Hash，当然，一个整数是无法对应一个字符串的。</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">所以Hash函数是Hash表最核心的部分，对于一个Hash函数，评价其优劣的标准应为随机性或离散性，即对任意一组标本，进入Hash表每一个单元（cell）之概率的平均程度，因为这个概率越平均，两个字符串计算出的Hash值相等hash collision的可能越小，数据在表中的分布就越平均，表的空间利用率就越高。</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">Hash表的构造和冲突的不同实现方法对执行效率也有一定的影响.</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">DJBHash是一种非常流行的算法，俗称"Times33"算法。Times33的算法很简单，就是不断的乘33，原型如下</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">hash(i) = hash(i-1) * 33 + str[i]</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">Time33在效率和随机性两方面上俱佳。</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">其它常用字符串哈希函数有：</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">BKDRHash，APHash，JSHash，RSHash，SDBMHash，PJWHash，ELFHash等。BKDRHash和APHash也是比较优秀的算法。当然要根据具体应用选择合适的Hash算法，比如字符集的考虑。</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">APHash作者Arash Partow有一个页面很有参考价值，包括了各种Hash的介绍及代码。</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">http://www.partow.net/programming/hashfunctions/#RSHashFunction</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">Blizzard使用的算法比较精妙，被称为"One-Way Hash"，并且在Hash表中使用了三个哈希值(一个用来确定位置，另外两个用来校验)。</span><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><br style="line-height: 25px; color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; background-color: #f6f6ed; " /><span style="color: #6e6e6e; font-family: Arial, Helvetica, simsun, u5b8bu4f53; line-height: 25px; background-color: #f6f6ed; ">MD5等加密算法也属于hash，不过已被中国学者找到碰撞检测的破解算法</span><img src ="http://www.cppblog.com/humanchao/aggbug/196690.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/humanchao/" target="_blank">胡满超</a> 2012-12-26 17:08 <a href="http://www.cppblog.com/humanchao/archive/2012/12/26/196690.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>转：循环有序数组查找问题</title><link>http://www.cppblog.com/humanchao/archive/2012/12/26/196686.html</link><dc:creator>胡满超</dc:creator><author>胡满超</author><pubDate>Wed, 26 Dec 2012 08:15:00 GMT</pubDate><guid>http://www.cppblog.com/humanchao/archive/2012/12/26/196686.html</guid><wfw:comment>http://www.cppblog.com/humanchao/comments/196686.html</wfw:comment><comments>http://www.cppblog.com/humanchao/archive/2012/12/26/196686.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/humanchao/comments/commentRss/196686.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/humanchao/services/trackbacks/196686.html</trackback:ping><description><![CDATA[<a href="http://blog.sina.com.cn/s/blog_a2498b5b01014bsg.html">http://blog.sina.com.cn/s/blog_a2498b5b01014bsg.html<br /><br /></a><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; "><strong>题目描述：</strong></span></span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;一个循环有序数组（如：3,4,5,6,7,8,9,0,1,2），不知道其最小值的位置，要查找任一数值的位置。要求算法时间复杂度为log2(n)。</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><br /></span><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; "><strong>问题分析：</strong></span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;我们可以把循环有序数组分为左右两部分（以mid = （low+high）/ 2为界），由循环有序数组的特点知，左右两部分必有一部分是有序的，我们可以找出有序的这部分，然后看所查找元素是否在有序部分，若在，则直接对有序部分二分查找，若不在，对无序部分递归调用查找函数。</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; "><strong>代码如下：</strong></span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;#include &lt;iostream&gt;</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;using namespace std;</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;int binarySearch(int a[],int low,int high,int value)&nbsp;<wbr>&nbsp;//二分查找<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;if(low&gt;high)<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return -1;</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;int mid=(low+high)/2;</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;if(value==a[mid])<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return mid;<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;else if(value&gt;a[mid])<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return binarySearch(a,mid+1,high,value);<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;else<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return binarySearch(a,low,mid-1,value);<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;int Search(int a[],int low,int high,int value)&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;//循环有序查找函数<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;int mid=(low+high)/2;</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;if(a[mid]&gt;a[low])&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;//左有序<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;if(a[low]&lt;=value &amp;&amp; value&lt;=a[mid] )&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;//说明value在左边，直接二分查找<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return binarySearch(a,low,mid,value);<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;else&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;//value在右边<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return Search(a,mid+1,high,value);<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;else&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;//右有序<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;if(a[mid]&lt;=value &amp;&amp; value&lt;=a[high])<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return binarySearch(a,mid,high,value);<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;else<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return Search(a,low,mid-1,value);<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;int main()<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;{<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;int a[]={3,4,5,6,7,8,9,0,1,2};</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;cout&lt;&lt;Search(a,0,9,0)&lt;&lt;endl;</span></span></p><p style="margin: 1em 0px 0.5em; padding: 0px; border: 0px; list-style: none; word-wrap: normal; word-break: normal; color: #464646; font-family: simsun; background-color: #bcd3e5; "><span style="word-wrap: normal; word-break: normal; "><span style="word-wrap: normal; word-break: normal; line-height: 19px; font-size: small; ">&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;return 0;<br />&nbsp;<wbr>&nbsp;<wbr>&nbsp;<wbr>&nbsp;}</span></span></p><img src ="http://www.cppblog.com/humanchao/aggbug/196686.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/humanchao/" target="_blank">胡满超</a> 2012-12-26 16:15 <a href="http://www.cppblog.com/humanchao/archive/2012/12/26/196686.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>转：大小端问题</title><link>http://www.cppblog.com/humanchao/archive/2012/12/26/196684.html</link><dc:creator>胡满超</dc:creator><author>胡满超</author><pubDate>Wed, 26 Dec 2012 08:06:00 GMT</pubDate><guid>http://www.cppblog.com/humanchao/archive/2012/12/26/196684.html</guid><wfw:comment>http://www.cppblog.com/humanchao/comments/196684.html</wfw:comment><comments>http://www.cppblog.com/humanchao/archive/2012/12/26/196684.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/humanchao/comments/commentRss/196684.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/humanchao/services/trackbacks/196684.html</trackback:ping><description><![CDATA[<p align="center" style="text-align:center;"></p><div style="text-align: left;"><font face="楷体_GB2312"><span style="font-size: 21px;"><strong>转自：</strong></span></font><a href="http://wenku.baidu.com/view/9e2d2f3e5727a5e9856a6167.html">http://wenku.baidu.com/view/9e2d2f3e5727a5e9856a6167.html</a><br /><font face="楷体_GB2312"><span style="font-size: 21px;"><strong><br /></strong></span></font></div><strong><span style="font-size:16.0pt;font-family: 楷体_GB2312;Times New Roman&quot;;">大小端问题</span></strong><strong></strong><p>&nbsp;</p>  <p align="left"><strong><span style="font-size:12.0pt; font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">By unanao</span></strong></p>  <p align="left"><strong><span style="font-size:12.0pt; font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&lt;sunjianjiao@gmail.com&gt;</span></strong></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">一、什么是大小端问题</span></p>  <p align="left"><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">(From</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">《</span><span style="font-size:12.0pt; font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">Computer Systems,A Programer's Perspective</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">》</span><span style="font-size:12.0pt; font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">)</span><span style="font-size: 12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">在几乎所有的机器上，多字节对象被存储为连续的字节序列，对象的地址为所使用字节序列中最低字节地址。</span></p>  <p align="left" style="text-indent: 24pt; line-height: 18pt; "><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">小端：某些机器选择在存储器中按照从最低有效字节到最高有效字节的顺序存储对象，这种最低有效字节在最前面的表示方式被称为</span><strong><em><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">小端法</span></em></strong><strong><em><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">(little endian)</span></em></strong> <span style="font-size: 12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">。</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">这样的存储模式有点儿类似于把数据当作字符串顺序处理：地址由小向大增加，而数据从高位往低位放；</span></p>  <p align="left" style="line-height: 18pt; "><span style="font-size:12.0pt; font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">大端：某些机器则按照从最高有效字节到最低有效字节的顺序储存，这种最高有效字节在最前面的方式被称为</span><strong><em><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">大端法</span></em></strong><strong><em><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">(big endian)</span></em></strong> <span style="font-size: 12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">。</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">这种存储模式将地址的高低和数据位权有效地结合起来，高地址部分权值高，低地址部分权值低，和我们的逻辑方法一致。</span></p>  <p>&nbsp;</p>  <p>&nbsp;<span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">举个例子来说名大小端</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">:&nbsp; </span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">比如一个</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">int x, </span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">地址为</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">0x100, </span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">它的值为</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">0x1234567. </span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">则它所占据的</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">0x100, 0x101, 0x102, 0x103</span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">地址组织如下图</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">:</span></p>  <p><br /><img src="http://www.cppblog.com/images/cppblog_com/humanchao/新建位图图像.jpg" width="601" height="180" alt="" /><br /><br /></p>  <p align="left" style="line-height: 18pt; "><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">二、为什么会有大小端模式之分呢？</span></p>  <p align="left" style="text-indent: 25.2pt; line-height: 18pt; "><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">这是因为在计算机系统中，我们是以字节为单位的，每个地址单元都对应着一个字节，一个字节为</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; "> 8bit</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">。但是在</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">C</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">语言中除了</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">8bit</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">的</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">char</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">之外，还有</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">16bit</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">的</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">short</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">型，</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">32bit</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">的</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">long</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">型（要看具体的编译器），另外，对于位数大于</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; "> 8</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">位的处理器，例如</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">16</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">位或者</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">32</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">位的处理器，由于寄存器宽度大于一个字节，那么必然存在着一个如果将多个字节安排的问题。因此就导致了大端存储模式和小端存储模式。例如一个</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">16bit</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">的</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">short</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">型</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">x</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">，在内存中的地址为</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x0010</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">，</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">x</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">的值为</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x1122</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">，那么</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x11</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">为高字节，</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x22</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">为低字节。对于</span> <span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">大端模式，就将</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x11</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">放在低地址中，即</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x0010</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">中，</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x22</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">放在高地址中，即</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">0x0011</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">中。小端模式，刚好相反。我们常用的</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">X86</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">结构是小端模</span> <span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">式，而</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">KEIL C51</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">则为大端模式。很多的</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">ARM</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">，</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">DSP</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">都为小端模式。有些</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; letter-spacing: 0.4pt; ">ARM</span><span style="font-size: 12pt; font-family: 楷体_GB2312; letter-spacing: 0.4pt; ">处理器还可以由硬件来选择是大端模式还是小端模式。</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;">三、如何区分大小端问题：</span></p>  <p><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;">方法</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">1</span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;">：</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">#include &lt;stdio.h&gt;</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">int main(void)</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">{</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int i = 1;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; unsigned char *pointer;</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pointer = (unsigned char *)&amp;i;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if(*pointer)</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("litttle_endian");</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("big endian\n");</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return 0;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">}</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">C</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;">中的数据类型都是从内存的低地址向高地址扩展，取址运算</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">"&amp;"</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;">都是取低地址</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">。小端方式中（</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">i</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">占至少两个字节的长度）则</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">i</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">所分配的内存最小地址那个字节中就存着</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">1</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">，其他字节是</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">0</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">。</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">大端的话则</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">1</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">在</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">i</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">的最高地址字节处存放，</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">char</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">是一个字节，所以强制将</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">char</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">型量</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">p</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">指向</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">i</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">，</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">则</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">p</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">指向的一定是</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">i</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">的最低地址，那么就可以判断</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">p</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">中的值是不是</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">1</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">来确定是不是小端。</span></p>  <p>&nbsp;</p>  <p><span style="font-size: 12pt; font-family: 楷体_GB2312; ">方法</span><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">2</span><span style="font-size: 12pt; font-family: 楷体_GB2312; ">：</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">#include &lt;stdio.h&gt;</span></p>  <p>&nbsp;</p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">int main(void)</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">{</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; union {</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; short a;</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; char ch;</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; } u;</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; u.a = 1;</span></p>  <p>&nbsp;</p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (u.ch == 1)</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("Littel endian\n");</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("Big endian\n");</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</span></p>  <p><span style="font-size: 12pt; font-family: 'Times New Roman', serif; ">}</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;">利用联合体的特点，数据成员共享内存空间，</span><span style="font-size: 12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">union</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;">中元素的起始地址都是相同的</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&#8212;&#8212;</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;">位于联合的开始。</span> <span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;">用</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">char</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;">来截取感兴趣的字节<span style="color:#2B2BD5">。</span></span></p>  <p>&nbsp;</p>  <p align="left"><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">四、需要考虑大小端（字节顺序）的情况</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;"> </span></p>  <p align="left"><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">1</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">、所写的程序需要向不同的硬件平台迁移，说不定哪一个平台是大端还是小端，为了保证可移植性，一定提前考虑好。</span></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">2. </span></strong><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">在不同类型的机器之间通过网络传送二进制数据时。</span> <span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">一个常见的问题是当小端法机器产生的数据被发送到大端法机器或者反之时，接受程序会发现，字</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">(word)</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">里的字节</span><span style="font-size:12.0pt; font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">(byte)</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">成了反序的。为了避免这类问</span> <span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">题，网络应用程序的代码编写必须遵守已建立的关于字节顺序的规则，以确保发送方机器将它的内部表示转换成网络标准，而接受方机器则将网络标准转换为它的内部标准。</span></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">3. </span></strong><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">当阅读表示整数的字节序列时。这通常发生在检查机器级程序时，</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">e.g.</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">：反汇编得到的一条指令：</span><span style="font-size:12.0pt;font-family: &quot;Times New Roman&quot;,&quot;serif&quot;;"><br /> 80483bd: 01 05 64 94 04 08&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; add %eax, 0x8049464</span></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">3. </span></strong><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">当编写强转的类型系统的程序时。</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">如写入的数据为</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">u32</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">型，但是读取的时候却是</span><span style="font-size: 12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">char</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">型的。如：</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">0x1234, </span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">大端读取为</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">12</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">时，小端独到的是</span><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">34</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">。</span></p>  <p align="left"><strong><span style="font-size: 12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">六、提高程序的可移植性</span></strong><strong></strong></p>  <p align="left"><strong><span style="font-size: 12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;">使用宏编译</span></strong><strong></strong></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">#ifdef LITTLE_ENDIAN</span></strong></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">//</span></strong><strong><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">小端的代码</span></strong><strong></strong></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">#else</span></strong></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">//</span></strong><strong><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;;">大端的代码</span></strong><strong></strong></p>  <p align="left"><strong><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">#endif</span></strong></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt; font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;">七、大、小端之间的转换</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">1</span><span style="font-size:12.0pt;font-family:楷体_GB2312;Times New Roman&quot;;Times New Roman&quot;">、小端转换为大端</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">#include &lt;stdio.h&gt;</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">void show_byte(char *addr, int len)</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">{</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int i;</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for (i = 0; i &lt; len; i++)</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("%.2x \t", addr[i]);</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("\n");</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">}</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">int endian_convert(int t)</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">{</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int result;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int i;</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; result = 0;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for (i = 0; i &lt; sizeof(t); i++)</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; result &lt;&lt;= 8;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; result |= (t &amp; 0xFF);</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; t &gt;&gt;= 8;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return result;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">}</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">int main(void)</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">{</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int i;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int ret;</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i = 0x1234567;</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; show_byte((char *)&amp;i, sizeof(int));</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret = endian_convert(i);</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; show_byte((char *)&amp;ret, sizeof(int));</span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return 0;</span></p>  <p><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;">}</span></p>  <p>&nbsp;</p><img src ="http://www.cppblog.com/humanchao/aggbug/196684.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/humanchao/" target="_blank">胡满超</a> 2012-12-26 16:06 <a href="http://www.cppblog.com/humanchao/archive/2012/12/26/196684.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>转：模拟洗牌(扑克)程序</title><link>http://www.cppblog.com/humanchao/archive/2012/12/26/196683.html</link><dc:creator>胡满超</dc:creator><author>胡满超</author><pubDate>Wed, 26 Dec 2012 07:59:00 GMT</pubDate><guid>http://www.cppblog.com/humanchao/archive/2012/12/26/196683.html</guid><wfw:comment>http://www.cppblog.com/humanchao/comments/196683.html</wfw:comment><comments>http://www.cppblog.com/humanchao/archive/2012/12/26/196683.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/humanchao/comments/commentRss/196683.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/humanchao/services/trackbacks/196683.html</trackback:ping><description><![CDATA[<p>转自：<a href="http://www.fredosaurus.com/notes-cpp/misc/random-shuffle.html">http://www.fredosaurus.com/notes-cpp/misc/random-shuffle.html</a><br /><br />// File&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : misc/random/deal.cpp - Randomly shuffle deck of cards.</p>  <p>// Illustrates : Shuffle algorithm, srand, rand.</p>  <p>// Improvements: Use classes for Card and Deck.</p>  <p>// Author&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Fred Swartz 2003-08-24, shuffle correction 2007-01-18</p>  <p>//&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Placed in the public domain.</p>  <p>&nbsp;</p>  <p>#include &lt;iostream&gt;</p>  <p>#include &lt;cstdlib&gt;&nbsp;&nbsp; // for srand and rand</p>  <p>#include &lt;ctime&gt;&nbsp;&nbsp;&nbsp;&nbsp; // for time</p>  <p>using namespace std;</p>  <p>&nbsp;</p>  <p>int main() {</p>  <p>&nbsp;&nbsp;&nbsp; int card[52];&nbsp;&nbsp;&nbsp; // array of cards;</p>  <p>&nbsp;&nbsp;&nbsp; int n;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // number of cards to deal</p>  <p>&nbsp;&nbsp;&nbsp; srand(time(0));&nbsp; // initialize seed "randomly"</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp; </p>  <p>&nbsp;&nbsp;&nbsp; for (int i=0; i&lt;52; i++) {</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; card[i] = i;&nbsp; // fill the array in order</p>  <p>&nbsp;&nbsp;&nbsp; }</p>  <p>&nbsp;&nbsp;&nbsp; </p>  <p>&nbsp;&nbsp;&nbsp; while (cin &gt;&gt; n) {&nbsp;&nbsp;&nbsp; </p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //--- Shuffle elements by randomly exchanging each with one other.</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for (int i=0; i&lt;(52-1); i++) {</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int r = i + (rand() % (52-i)); // Random remaining position.</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int temp = card[i]; card[i] = card[r]; card[r] = temp;</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; //--- Print first n cards as ints.</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for (int c=0; c&lt;n; c++) {</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cout &lt;&lt; card[c] &lt;&lt; " ";&nbsp; // Just print number</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cout &lt;&lt; endl;</p>  <p>&nbsp;&nbsp;&nbsp; }</p>  <p>&nbsp;&nbsp; </p>  <p>&nbsp;&nbsp; return 0;</p>  <p>}</p><img src ="http://www.cppblog.com/humanchao/aggbug/196683.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/humanchao/" target="_blank">胡满超</a> 2012-12-26 15:59 <a href="http://www.cppblog.com/humanchao/archive/2012/12/26/196683.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>转：海明距离</title><link>http://www.cppblog.com/humanchao/archive/2012/12/26/196680.html</link><dc:creator>胡满超</dc:creator><author>胡满超</author><pubDate>Wed, 26 Dec 2012 07:49:00 GMT</pubDate><guid>http://www.cppblog.com/humanchao/archive/2012/12/26/196680.html</guid><wfw:comment>http://www.cppblog.com/humanchao/comments/196680.html</wfw:comment><comments>http://www.cppblog.com/humanchao/archive/2012/12/26/196680.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/humanchao/comments/commentRss/196680.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/humanchao/services/trackbacks/196680.html</trackback:ping><description><![CDATA[<p style="color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff; ">转自：<a href="http://blog.csdn.net/fuyangchang/article/details/5637464">http://blog.csdn.net/fuyangchang/article/details/5637464</a><br />wiki地址<a title="http://en.wikipedia.org/wiki/Hamming_distance" href="http://en.wikipedia.org/wiki/Hamming_distance" style="color: #336699; text-decoration: none; ">http://en.wikipedia.org/wiki/Hamming_distance</a></p><p style="color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff; ">在信息领域，<strong>两个长度相等的字符串</strong>的海明距离是在相同位置上不同的字符的个数，也就是将一个字符串替换成另一个字符串需要的替换的次数。</p><p style="color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff; ">例如：</p><ul style="color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff; "><li>"<strong>toned</strong>" and "<strong>roses</strong>" is 3.</li><li><strong>1011101</strong>&nbsp;and&nbsp;<strong>1001001</strong>&nbsp;is 2.</li><li><strong>2173896</strong>&nbsp;and&nbsp;<strong>2233796</strong>&nbsp;is 3.</li></ul><p style="color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff; ">对于二进制来说，海明距离的结果相当于&nbsp;<em>a</em>&nbsp;<a href="http://en.wikipedia.org/wiki/Exclusive_OR" style="color: #336699; text-decoration: none; ">XOR</a>&nbsp;<em>b</em>&nbsp;结果中1的个数。</p><p style="color: #333333; font-family: Arial; line-height: 26px; background-color: #ffffff; "><p>python<span style="font-family:宋体;">代码如下</span></p>  <p>&nbsp;</p>  <p>def hamming_distance(s1, s2):</p>  <p>&nbsp;&nbsp;&nbsp; assert len(s1) == len(s2)</p>  <p>&nbsp;&nbsp;&nbsp; return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))</p>  <p>&nbsp;</p>  <p>print (hamming_distance("gdad","glas"))</p>  <p><span style="font-family:宋体;">结果是</span>2</p>  <p>&nbsp;</p>  <p>C<span style="font-family:宋体;">语言代码如下</span></p>  <p>&nbsp;</p>  <p>unsigned hamdist(unsigned x, unsigned y)</p>  <p>{</p>  <p>&nbsp; unsigned dist = 0, val = x ^ y;</p>  <p>&nbsp;</p>  <p>&nbsp; // Count the number of set bits</p>  <p>&nbsp; while(val)</p>  <p>&nbsp; {</p>  <p>&nbsp;&nbsp;&nbsp; ++dist; </p>  <p>&nbsp;&nbsp;&nbsp; val &amp;= val - 1;</p>  <p>&nbsp; }</p>  <p>&nbsp;</p>  <p>&nbsp; return dist;</p>  <p>}</p>  <p>&nbsp;</p>  <p>int main()</p>  <p>{</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; unsigned x="abcdcc";</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; unsigned y="abccdd";</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; unsigned z=hamdist(x,y);</p>  <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("%d",z);</p>  <p>}</p></p><img src ="http://www.cppblog.com/humanchao/aggbug/196680.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/humanchao/" target="_blank">胡满超</a> 2012-12-26 15:49 <a href="http://www.cppblog.com/humanchao/archive/2012/12/26/196680.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>转: 怎样量化评价搜索引擎的结果质量</title><link>http://www.cppblog.com/humanchao/archive/2012/12/19/196436.html</link><dc:creator>胡满超</dc:creator><author>胡满超</author><pubDate>Wed, 19 Dec 2012 03:03:00 GMT</pubDate><guid>http://www.cppblog.com/humanchao/archive/2012/12/19/196436.html</guid><wfw:comment>http://www.cppblog.com/humanchao/comments/196436.html</wfw:comment><comments>http://www.cppblog.com/humanchao/archive/2012/12/19/196436.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/humanchao/comments/commentRss/196436.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/humanchao/services/trackbacks/196436.html</trackback:ping><description><![CDATA[<h2>转自：<a href="http://www.infoq.com/cn/articles/cyw-evaluate-seachengine-result-quality">http://www.infoq.com/cn/articles/cyw-evaluate-seachengine-result-quality</a><br /><br />前言</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">搜索质量评估是搜索技术研究的基础性工作，也是核心工作之一。评价（Metrics）在搜索技术研发中扮演着重要角色，以至于任何一种新方法与他们的评价方式是融为一体的。</p><div style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><br />搜索引擎结果的好坏与否，体现在业界所称的在相关性（Relevance）上。相关性的定义包括狭义和广义两方面，狭义的解释是：检索结果和用户查询的相关程度。而从广义的层面，相关性可以理解为为用户查询的综合满意度。直观的来看，从用户进入搜索框的那一刻起，到需求获得满足为止，这之间经历的过程越顺畅，越便捷，搜索相关性就越好。本文总结业界常用的相关性评价指标和量化评价方法。供对此感兴趣的朋友参考。</div><h2>Cranfield评价体系</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">A Cranfield-like approach这个名称来源于英国Cranfield University，因为在二十世纪五十年代该大学首先提出了这样一套评价系统：由查询样例集、正确答案集、评测指标构成的完整评测方案，并从此确立了&#8220;评价&#8221;在信息检索研究中的核心地位。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">Cranfield评价体系由三个环节组成：</p><ol style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><li>抽取代表性的查询词，组成一个规模适当的集合</li><li>针对查询样例集合，从检索系统的语料库中寻找对应的结果，进行标注（通常人工进行）</li><li>将查询词和带有标注信息的语料库输入检索系统，对系统反馈的检索结果，使用预定义好的评价计算公式，用数值化的方法来评价检索系统结果和标注的理想结果的接近程度</li></ol><h2>查询词集合的选取</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">Cranfield评价系统在各大搜索引擎公司内有广泛的应用。具体应用时，首先需要解决的问题是构造一个测试用查询词集合。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">按照Andrei Broder（曾在AltaVista/IBM/Yahoo任职）的研究，查询词可分为3类：寻址类查询（Navigational）、信息类查询(Informational)、事务类查询(Transactional)。对应的比例分别为</p><pre style="overflow: auto; width: 964.25px; padding: 0px 0px 5px; font-size: 12px; line-height: 15px; font-family: 'Courier New', Courier; color: #222222; margin-top: 0px; margin-bottom: 0px; background-color: #fafafa; border: 2px solid #efefef; ">Navigational ： 12.3%  Informational ： 62.0%  Transactional ： 25.7% </pre><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">为了使得评估符合线上实际情况，通常查询词集合也会按比例进行选取。通常从线上用户的Query Log文件中自动抽取。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">另外查询集合的构造时，除了上述查询类型外，还可以考虑Query的频次，对热门query（高频查询）、长尾query（中低频）分别占特定的比例。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">另外，在抽取Query时，往往Query的长短也是一个待考虑的因素。因为短query（单term的查询）和长Query（多Term的查询）排序算法往往会有一些不同。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">构成查询集合后，使用这些查询词，在不同系统（例如对比百度和Google）或不同技术间（新旧两套Ranking算法的环境）进行搜索，并对结果进行评分，以决定优劣。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">附图：对同一Query：&#8220;社会保险法&#8221;，各大搜索引擎的结果示意图。下面具体谈谈评分的方法。</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image1.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image2.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image3.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image4.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image5.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image6.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><h2>Precision-recall（准确率-召回率方法）</h2><h3>计算方法</h3><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">信息检索领域最广为人知的评价指标为Precision-Recall（准确率-召回率）方法。该方法从提出至今已经历半个世纪，至今在很多搜索引擎公司的效果评估中使用。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">顾名思义，这个方法由准确率和召回率这两个相互关联的统计量构成：召回率（Recall）衡量一个查询搜索到所有相关文档的能力，而准确率（Precision）衡量搜索系统排除不相关文档的能力。（通俗的解释一下：准确率就是算一算你查询得到的结果中有多少是靠谱的；而召回率表示所有靠谱的结果中，有多少被你给找回来了）。这两项是评价搜索效果的最基础指标，其具体的计算方法如下。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">Precision-recall方法假定对一个给定的查询，对应一个被检索的文档集合和一个不相关的文档集合。这里相关性被假设为二元的，用数学形式化方法来描述，则是：</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">A表示相关文档集合</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><span style="text-decoration: overline; ">A</span>表示不相关集合</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">B表示被检索到的文档集合</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><span style="text-decoration: overline; ">B</span>表示未被检索到的文档集合</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">则单次查询的准确率和召回率可以用下述公式来表达：</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img alt="" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image7.jpg" _href="img://image7.jpg" _p="true" style="border: 0px; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">（运算符&#8745; 表示两个集合的交集。|x|符号表示集合x中的元素数量）</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">从上面的定义不难看出，召回率和准确率的取值范围均在[0,1]之间。那么不难想象，如果这个系统找回的相关越多，那么召回率越高，如果相关结果全部都给召回了，那么recall此时就等于1.0。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "></p><table cellspacing="0" cellpadding="0" border="1" style="color: #000000; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; text-align: start; background-color: #ffffff; "><tbody><tr><td width="156" valign="top" style="font-size: small; ">&nbsp;</td><td width="156" valign="top" style="font-size: small; "><p align="center">相关的</p></td><td width="156" valign="top" style="font-size: small; "><p align="center">不相关</p></td></tr><tr><td width="156" valign="top" style="font-size: small; "><p align="center">被检索到</p></td><td width="156" valign="top" style="font-size: small; "><p align="center">A&#8745; B</p></td><td width="156" valign="top" style="font-size: small; "><p align="center"><span style="text-decoration: overline; ">A</span>&#8745; B</p></td></tr><tr><td width="156" valign="top" style="font-size: small; "><p align="center">未被检索到</p></td><td width="156" valign="top" style="font-size: small; "><p align="center">A&#8745;<span style="text-decoration: overline; ">B</span></p></td><td width="156" valign="top" style="font-size: small; "><p align="center"><span style="text-decoration: overline; ">A</span>&#8745;<span style="text-decoration: overline; ">B</span></p></td></tr></tbody></table><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "></p><h3>Precision-Recall曲线</h3><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">召回率和准确率分别反映了检索系统的两个最重要的侧面，而这两个侧面又相互制约。因为大规模数据集合中，如果期望检索到更多相关的文档，必然需要&#8220;放宽&#8221;检索标准，因此会导致一些不相关结果混进来，从而使准确率受到影响。类似的，期望提高准确率，将不相关文档尽量去除时，务必要执行更&#8220;严格&#8221;的检索策略，这样也会使一些相关的文档被排除在外，使召回率下降。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">所以为了更清晰的描述两者间的关系，通常我们将Precison-Recall用曲线的方式绘制出来，可以简称为P-R diagram。常见的形式如下图所示。（通常曲线是一个逐步向下的走势，即随着Recall的提高，Precision逐步降低）</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image8.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><h3>P-R的其它形态</h3><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">一些特定搜索应用，会更关注搜索结果中错误的结果。例如，搜索引擎的反作弊系统（Anti-Spam System）会更关注检索结果中混入了多少条作弊结果。学术界把这些错误结果称作假阳性（False Positive）结果，对这些应用，通常选择用虚报率（Fallout）来统计：</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img alt="" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image9.jpg" _href="img://image9.jpg" _p="true" style="border: 0px; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">Fallout和Presion本质是完全相同的。只是分别从正反两方面来计算。实际上是P-R的一个变种。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">再回到上图，Presion-Recall是一个曲线，用来比较两个方法的效果往往不够直观，能不能对两者进行综合，直接反映到一个数值上呢？为此IR学术界提出了F值度量（F -Measure）的方法。F-Measure通过Presion和Recall的调和平均数来计算，公式为：</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image10.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">其中参数&#955;&#949;(0,1)调节系统对Precision和Recall的平衡程度。（通常取&#955;=0.5，此时&nbsp;<img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image11-1.jpg" alt="" style="border: 0px; " />）</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">这里使用调和平均数而不是通常的几何平均或算术平均，原因是调和平均数强调较小数值的重要性，能敏感的反映小数字的变化，因此更适合用来反映检索效果。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">使用F Measure的好处是只需要一个单一的数字就可以总结系统的检索效果，便于比较不同搜索系统的整体效果。</p><h2>P@N方法</h2><h3>点击因素</h3><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">传统的Precision-Recall并不完全适用对搜索引擎的评估，原因是搜索引擎用户的点击方式有其特殊性，包括：</p><pre style="overflow: auto; width: 964.25px; padding: 0px 0px 5px; font-size: 12px; line-height: 15px; font-family: 'Courier New', Courier; color: #222222; margin-top: 0px; margin-bottom: 0px; background-color: #fafafa; border: 2px solid #efefef; ">A 60-65%的查询点击了名列搜索结果前10条的网页；  B 20-25%的人会考虑点击名列11到20的网页；  C 仅有3-4%的会点击名列搜索结果中列第21到第30名的网页 </pre><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">也就是说，绝大部分用户是不愿意翻页去看搜索引擎给出的后面的结果。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">而即使在搜索结果的首页（通常列出的是前10条结果），用户的点击行为也很有意思，我们通过下面的Google点击热图（Heat Map）来观察（这个热图在二维搜索结果页上通过光谱来形象的表达不同位置用户的点击热度。颜色约靠近红色表示点击强度越高）：</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image12.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">从图中可以看出，搜索结果的前3条吸引了大量的点击，属于热度最高的部分。也就是说，对搜苏引擎来说，最前的几条结果是最关键的，决定了用户的满意程度。</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image13.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">康乃尔大学的研究人员通过eye tracking实验获得了更为精确的Google搜索结果的用户行为分析图。从这张图中可以看出，第一条结果获得了56.38%的搜索流量，第二条和第三条结果的排名依次降低，但远低于排名第一的结果。前三条结果的点击比例大约为11:3:2 。而前三条结果的总点击几乎分流了搜索流量的80%。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">另外的一些有趣的结论是，点击量并不是按照顺序依次递减的。排名第七位获得的点击是最少的，原因可能在于用户在浏览过程中下拉页面到底部，这时候就只显示最后三位排名网站，第七名便容易被忽略。而首屏最后一个结果获得的注意力（2.55）是大于倒数第二位的(1.45)，原因是用户在翻页前，对最后一条结果印象相对较深。搜索结果页面第二页排名第一的网页（即总排名11位的结果）所获得的点击只有首页排名第十网站的40%，与首页的第一条结果相比，更是只有其1/60至1/100的点击量。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">因此在量化评估搜索引擎的效果时，往往需要根据以上搜索用户的行为特点，进行针对性的设计。</p><h3>P@N的计算方法</h3><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">P@N本身是Precision@N的简称，指的是对特定的查询，考虑位置因素，检测前N条结果的准确率。例如对单次搜索的结果中前5篇，如果有4篇为相关文档，则P@5 = 4/5 = 0.8 。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">测试通常会使用一个查询集合（按照前文所述方法构造），包含若干条不同的查询词，在实际使用P@N进行评估时，通常使用所有查询的P@N数据，计算算术平均值，用来评判该系统的整体搜索结果质量。</p><h3>N的选取</h3><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">对用户来说，通常只关注搜索结果最前若干条结果，因此通常搜索引擎的效果评估只关注前5、或者前3结果，所以我们常用的N取值为P@3或P@5等。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">对一些特定类型的查询应用，如寻址类的查询（Navigational Search），由于目标结果极为明确，因此在评估时，会选择N=1（即使用P@1）。举个例子来说，搜索&#8220;新浪网&#8221;、或&#8220;新浪首页&#8221;，如果首条结果不是 新浪网（url：<a href="http://www.sina.com.cn/" style="color: #0b59b2; ">www.sina.com.cn</a>），则直接判该次查询精度不满足需求，即P@1=0</p><h2>MRR</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">上述的P@N方法，易于计算和理解。但细心的读者一定会发现问题，就是在前N结果中，排序第1位和第N位的结果，对准确率的影响是一样的。但实际情况是，搜索引擎的评价是和排序位置极为相关的。即排第一的结果错误，和第10位的结果错误，其严重程度有天壤之别。因此在评价系统中，需要引入位置这个因素。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">MRR是平均排序倒数（Mean Reciprocal Rank）的简称，MRR方法主要用于寻址类检索（Navigational Search）或问答类检索（Question Answering），这些检索方法只需要一个相关文档，对召回率不敏感，而是更关注搜索引擎检索到的相关文档是否排在结果列表的前面。MRR方法首先计算每一个查询的第一个相关文档位置的倒数，然后将所有倒数值求平均。例如一个包含三个查询词的测试集，前5结果分别为：</p><pre style="overflow: auto; width: 964.25px; padding: 0px 0px 5px; font-size: 12px; line-height: 15px; font-family: 'Courier New', Courier; color: #222222; margin-top: 0px; margin-bottom: 0px; background-color: #fafafa; border: 2px solid #efefef; ">查询一结果：1.AN 2.AR 3.AN 4.AN 5.AR  查询二结果：1.AN 2.AR 3.AR 4.AR 5.AN  查询三结果：1.AR 2.AN 3.AN 4.AN 5.AR  </pre><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">其中AN表示不相关结果，AR表示相关结果。那么第一个查询的排序倒数（Reciprocal Rank）RR<sub>1</sub>&nbsp;= 1/2=0.5 ；第二个结果RR<sub>2</sub>&nbsp;= 1/2 = 0.5 ； 注意倒数的值不变，即使查询二获得的相关结果更多。同理，RR<sub>3</sub>= 1/1 = 1。 对于这个测试集合，最终MRR=（RR<sub>1</sub>+RR<sub>2</sub>+RR<sub>3</sub>）/ 3 = 0.67</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">然而对大部分检索应用来说，只有一条结果无法满足需求，对这种情况，需要更合适的方法来计算效果，其中最常用的是下述MAP方法。</p><h2>MAP</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">MAP方法是Mean Average Precison，即平均准确率法的简称。其定义是求每个相关文档检索出后的准确率的平均值（即Average Precision）的算术平均值（Mean）。这里对准确率求了两次平均，因此称为Mean Average Precision。（注：没叫Average Average Precision一是因为难听，二是因为无法区分两次平均的意义）</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">MAP 是反映系统在全部相关文档上性能的单值指标。系统检索出来的相关文档越靠前(rank 越高)，MAP就应该越高。如果系统没有返回相关文档，则准确率默认为0。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">例如：假设有两个主题：</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">主题1有4个相关网页，主题2有5个相关网页。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">某系统对于主题1检索出4个相关网页，其rank分别为1, 2, 4, 7；</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">对于主题2检索出3个相关网页，其rank分别为1,3,5。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">对于主题1，平均准确率MAP计算公式为：</p><pre style="overflow: auto; width: 964.25px; padding: 0px 0px 5px; font-size: 12px; line-height: 15px; font-family: 'Courier New', Courier; color: #222222; margin-top: 0px; margin-bottom: 0px; background-color: #fafafa; border: 2px solid #efefef; ">(1/1+2/2+3/4+4/7)/4=0.83。 </pre><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">对于主题2，平均准确率MAP计算公式为：</p><pre style="overflow: auto; width: 964.25px; padding: 0px 0px 5px; font-size: 12px; line-height: 15px; font-family: 'Courier New', Courier; color: #222222; margin-top: 0px; margin-bottom: 0px; background-color: #fafafa; border: 2px solid #efefef; ">(1/1+2/3+3/5+0+0)/5=0.45。 </pre><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">则MAP= (0.83+0.45)/2=0.64。&#8221;</p><h2>DCG方法</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">DCG是英文Discounted cumulative gain的简称，中文可翻译为&#8220;折扣增益值&#8221;。DCG方法的基本思想是：</p><ol style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><li>每条结果的相关性分等级来衡量</li><li>考虑结果所在的位置，位置越靠前的则重要程度越高</li><li>等级高（即好结果）的结果位置越靠前则值应该越高，否则给予惩罚</li></ol><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">我们首先来看第一条：相关性分级。这里比计算Precision时简单统计&#8220;准确&#8221;或&#8220;不准确&#8221;要更为精细。我们可以将结果细分为多个等级。比如常用的3级：Good（好）、Fair（一般）、Bad（差）。对应的分值rel为：Good:3 / Fair:2 / Bad:1 。一些更为细致的评估使用5级分类法：Very Good（明显好）、Good（好）、Fair（一般）、Bad（差）、Very Bad（明显差），可以将对应分值rel设置为：Very Good:2 / Good:1 / Fair:0 / Bad:-1 / Very Bad: -2</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">评判结果的标准可以根据具体的应用来确定，Very Good通常是指结果的主题完全相关，并且网页内容丰富、质量很高。而具体到每条</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image14.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">DCG的计算公式并不唯一，理论上只要求对数折扣因子的平滑性。我个人认为下面的DCG公式更合理，强调了相关性，第1、2条结果的折扣系数也更合理：</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image15.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">此时DCG前4个位置上结果的折扣因子（Discount factor）数值为：</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "></p><table cellspacing="0" cellpadding="0" border="1" style="color: #000000; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; text-align: start; background-color: #ffffff; "><tbody><tr><td width="189" valign="top" style="font-size: small; "><p align="center">i</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">log<sub>2</sub>&nbsp;(i+1)</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">1/log<sub>2</sub>&nbsp;(i+1)</p></td></tr><tr><td width="189" valign="top" style="font-size: small; "><p align="center">1</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">1</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">1</p></td></tr><tr><td width="189" valign="top" style="font-size: small; "><p align="center">2</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">1.59</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">0.63</p></td></tr><tr><td width="189" valign="top" style="font-size: small; "><p align="center">3</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">2</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">0.5</p></td></tr><tr><td width="189" valign="top" style="font-size: small; "><p align="center">4</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">2.32</p></td><td width="189" valign="top" style="font-size: small; "><p align="center">0.43</p></td></tr></tbody></table><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">取以2为底的log值也来自于经验公式，并不存在理论上的依据。实际上，Log的基数可以根据平滑的需求进行修改，当加大数值时（例如使用log<sub>5</sub>&nbsp;代替log<sub>2</sub>），折扣因子降低更为迅速，此时强调了前面结果的权重。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">为了便于不同类型的query结果之间横向比较，以DCG为基础，一些评价系统还对DCG进行了归一，这些方法统称为nDCG（即 normalize DCG）。最常用的计算方法是通过除以每一个查询的理想值iDCG（ideal DCG）来进行归一，公式为：</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image16.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">求nDCG需要标定出理想情况的iDCG，实际操作的时候是异常困难的，因为每个人对&#8220;最好的结果&#8221;理解往往各不相同，从海量数据里选出最优结果是很困难的任务，但是比较两组结果哪个更好通常更容易，所以实践应用中，通常选择结果对比的方法进行评估。</p><h2>怎样实现自动化的评估？</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">以上所介绍的搜索引擎量化评估指标，在Cranfield评估框架（Cranfield Evaluation Framework）中被广泛使用。业界知名的TREC（文本信息检索会议）就一直基于此类方法组织信息检索评测和技术交流。除了TREC外，一些针对不同应用设计的Cranfield评测论坛也在进行进行（如 NTCIR、IREX等）。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">但Cranfield评估框架存在的问题是查询样例集合的标注上。利用手工标注答案的方式进行网络信息检索的评价是一个既耗费人力、又耗费时间的过程，只有少数大公司能够使用。并且由于搜索引擎算法改进、运营维护的需要，检索效果评价反馈的时间需要尽量缩短，因此自动化的评测方法对提高评估效率十分重要。最常用的自动评估方法是A/B testing系统。</p><h3>A/B Testing</h3><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image17.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p align="center" style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><strong>A/B Testing系统</strong></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">A/B testing系统在用户搜索时，由系统来自动决定用户的分组号（Bucket id），通过自动抽取流量导入不同分支，使得相应分组的用户看到的是不同产品版本（或不同搜索引擎）提供的结果。用户在不同版本产品下的行为将被记录下来，这些行为数据通过数据分析形成一系列指标，而通过这些指标的比较，最后就形成了各版本之间孰优孰劣的结论。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">在指标计算时，又可细分为两种方法，一种是基于专家评分的方法；一种是基于点击统计的方法。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">专家评分的方法通常由搜索核心技术研发和产品人员来进行，根据预先设定的标准对A、B两套环境的结果给予评分，获取每个Query的结果对比，并根据nDCG等方法计算整体质量。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">点击评分有更高的自动化程度，这里使用了一个假设：同样的排序位置，点击数量多的结果质量优于点击数量少的结果。（即A2表示A测试环境第2条结果，如果A2 &gt; B2，则表示A2质量更好）。通俗的说，相信群众（因为群众的眼睛是雪亮的）。在这个假设前提下，我们可以将A/B环境前N条结果的点击率自动映射为评分，通过统计大量的Query点击结果，可以获得可靠的评分对比。</p><h3>Interleaving Testing</h3><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">另外2003年由Thorsten Joachims 等人提出的Interleaving testing方法也被广泛使用。该方法设计了一个元搜索引擎，用户输入查询词后，将查询词在几个著名搜索引擎中的查询结果随机混合反馈给用户，并收集随后用户的结果点击行为信息．根据用户不同的点击倾向性，就可以判断搜索引擎返回结果的优劣，</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">如下图所示，将算法A和B的结果交叉放置，并分流量进行测试，记录用户点击信息。根据点击分布来判断A和B环境的优劣。</p><p style="overflow-x: auto; width: 964.25px; overflow-y: hidden; font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><img border="0" _href="img://image1.jpg" _p="true" src="http://www.infoq.com/resource/articles/cyw-evaluate-seachengine-result-quality/zh/resources/image18.jpg" alt="" style="border: 0px; display: block; float: none; margin-left: auto; margin-right: auto; " /></p><p align="center" style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; "><strong>Interleaving Testing评估方法</strong></p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">Joachims同时证明了Interleaving Testing评价方法与传统Cranfield评价方法的结果具有较高的相关性。由于记录用户选择检索结果的行为是一个不耗费人力的过程，因此可以便捷的实现自动化的搜索效果评估。</p><h2>总结</h2><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">没有评估就没有进步&#8212;&#8212;对搜索效果的量化评测，目的是准确的找出现有搜索系统的不足（没有哪个搜索系统是完美的），进而一步一个脚印对算法、系统进行改进。本文为大家总结了常用的评价框架和评价指标。这些技术像一把把尺子，度量着搜索技术每一次前进的距离。</p><hr style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; " /><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">感谢<a href="http://www.infoq.com/cn/bycategory.action?authorName=%E5%BC%A0%E5%87%AF%E5%B3%B0" style="color: #0b59b2; ">张凯峰</a>对 本文的审校。</p><p style="font-family: Lucida, 'Lucida Grande', Tahoma, sans-serif; font-size: 13px; line-height: 19px; background-color: #ffffff; ">给InfoQ中文站投稿或者参与内容翻译工作，请邮件至<a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#101;&#100;&#105;&#116;&#111;&#114;&#115;&#64;&#99;&#110;&#46;&#105;&#110;&#102;&#111;&#113;&#46;&#99;&#111;&#109;" style="color: #0b59b2; ">editors@cn.infoq.com</a>。也欢迎大家加入到<a target="_blank" href="http://groups.google.com/group/InfoQChina" style="color: #0b59b2; ">InfoQ中文站用户讨论组</a>中与我们的编辑和其他读者 朋友交流。</p><img src ="http://www.cppblog.com/humanchao/aggbug/196436.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/humanchao/" target="_blank">胡满超</a> 2012-12-19 11:03 <a href="http://www.cppblog.com/humanchao/archive/2012/12/19/196436.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>