﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-那谁的技术博客-随笔分类-linux kernel</title><link>http://www.cppblog.com/converse/category/10172.html</link><description>感兴趣领域:高性能服务器编程,算法,Linux内核</description><language>zh-cn</language><lastBuildDate>Mon, 11 Jan 2010 05:55:45 GMT</lastBuildDate><pubDate>Mon, 11 Jan 2010 05:55:45 GMT</pubDate><ttl>60</ttl><item><title>tokyocabinet1.4.19阅读笔记（一）hash数据库概述</title><link>http://www.cppblog.com/converse/archive/2010/01/10/105317.html</link><dc:creator>那谁</dc:creator><author>那谁</author><pubDate>Sun, 10 Jan 2010 02:22:00 GMT</pubDate><guid>http://www.cppblog.com/converse/archive/2010/01/10/105317.html</guid><wfw:comment>http://www.cppblog.com/converse/comments/105317.html</wfw:comment><comments>http://www.cppblog.com/converse/archive/2010/01/10/105317.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.cppblog.com/converse/comments/commentRss/105317.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/converse/services/trackbacks/105317.html</trackback:ping><description><![CDATA[开始正式的研究key-value形式的持久化存储方案了，第一个阅读的项目是tokyo cabinet，版本号是1.4.19.<br><br>tokyo cabinet支持几种数据库形式，包括hash数据库，B+树数据库，fix-length数据库，table数据库。目前我仅看了第一种hash数据库的实现。之所以选择这个，是因为第一这种类型的数据库似乎是TC中使用的最多的一种，其次它的算法比之B+树又更简单一些而效率上的表现也丝毫不差。<br><br>看看TC中代码的组织。关于上面几个分类的数据库实现，实际上在TC项目的代码组织中各自以单个文件的形式出现，比如hash数据库的代码全都集中在 tchdb.c/h中，也只不过4000多行罢了。除去这几种数据库的实现文件，其余的代码文件功能可以大体上分为两类，一类是辅助性质的代码，给项目中各个部分使用上的，另一部分就是单独的管理数据库的CLI程序的代码，比如tchmgr.c/h就是用于管理HASH数据库的CLI程序的代码。之所以要交代一下项目中代码的组织，无非是为了说明，其实如果将问题集中在HASH数据库或者其他形式的数据库实现上，起码在TC中，所要关注的代码是不多的。<br><br>首先来看数据库文件是如何组织的。<br><img alt=""  src="http://www.cppblog.com/images/cppblog_com/converse/12791/r_tokyo%20cabinet%20hash%20database%20file%20overview.png"><br>从图中可以看到，hash数据库文件大致分为四个部分：数据库文件头，bucket 数组，free pool数组，最后的是真正存放record的部分。下面对这几部分做一个说明。<br><br>1）数据库文件头<br>数据库文件头部分存放的是关于该数据库的一些总体信息，包括这些内容：<br>
<table summary="database header format">
    <tbody>
        <tr>
            <td class="label">name</td>
            <td class="label">offset</td>
            <td class="label">length</td>
            <td class="label">feature</td>
        </tr>
        <tr>
            <td>magic number</td>
            <td class="number">0</td>
            <td class="number">32</td>
            <td>identification of the database.  Begins with "ToKyO CaBiNeT"</td>
        </tr>
        <tr>
            <td>database type</td>
            <td class="number">32</td>
            <td class="number">1</td>
            <td>hash (0x01) / B+ tree (0x02) / fixed-length (0x03) / table (0x04)</td>
        </tr>
        <tr>
            <td>additional flags</td>
            <td class="number">33</td>
            <td class="number">1</td>
            <td>logical union of open (1&lt;&lt;0) and fatal (1&lt;&lt;1)</td>
        </tr>
        <tr>
            <td>alignment power</td>
            <td class="number">34</td>
            <td class="number">1</td>
            <td>the alignment size, by power of 2</td>
        </tr>
        <tr>
            <td>free block pool power</td>
            <td class="number">35</td>
            <td class="number">1</td>
            <td>the number of elements in the free block pool, by power of 2</td>
        </tr>
        <tr>
            <td>options</td>
            <td class="number">36</td>
            <td class="number">1</td>
            <td>logical union of large (1&lt;&lt;0), Deflate (1&lt;&lt;1), BZIP2 (1&lt;&lt;2), TCBS (1&lt;&lt;3), extra codec (1&lt;&lt;4)</td>
        </tr>
        <tr>
            <td>bucket number</td>
            <td class="number">40</td>
            <td class="number">8</td>
            <td>the number of elements of the bucket array</td>
        </tr>
        <tr>
            <td>record number</td>
            <td class="number">48</td>
            <td class="number">8</td>
            <td>the number of records in the database</td>
        </tr>
        <tr>
            <td>file size</td>
            <td class="number">56</td>
            <td class="number">8</td>
            <td>the file size of the database</td>
        </tr>
        <tr>
            <td>first record</td>
            <td class="number">64</td>
            <td class="number">8</td>
            <td>the offset of the first record</td>
        </tr>
        <tr>
            <td>opaque region</td>
            <td class="number">128</td>
            <td class="number">128</td>
            <td>users can use this region arbitrarily</td>
        </tr>
    </tbody>
</table>
<br>需要说明的是，上面这个表格来自tokyocabinet的官方文档说明，在<a href="http://1978th.net/tokyocabinet/spex-en.html#fileformat">这里</a>。同时，数据库文件中需要存放数据的地方，使用的都是小端方式存放的，以下就不再就这点做说明了。从上面的表格可以看出，数据库文件头的尺寸为256 bytes。<br>在操作hash数据库的所有API中，都会用到一个对象类型为TCHDB的指针，该结构体中存放的信息就包括了所有数据库文件头的内容，所以每次在打开或者创建一个hash数据库的时候，都会将数据库文件头信息读入到这个指针中（函数tchdbloadmeta）。<br><br>2）bucket 数组<br>bucket array中的每个元素都是一个整数，按照使用的是32位还是64位系统，存放的也就是32位或者64位的整数。这个数组存放的这个整数值，就是每次对 key 进行hash之后得到的hash值所对应的第一个元素在数据库文件中的偏移量。<br><br>3）free pool数组<br>free pool数组中的每个元素定义结构体如下：<br>
<div style="border: 1px solid #cccccc; padding: 4px 5px 4px 4px; background-color: #eeeeee; font-size: 13px; width: 98%;"><!--<br><br>Code highlighting produced by Actipro CodeHighlighter (freeware)<br>http://www.CodeHighlighter.com/<br><br>--><span style="color: #000000;">typedef&nbsp;</span><span style="color: #0000ff;">struct</span><span style="color: #000000;">&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000;">//</span><span style="color: #008000;">&nbsp;type&nbsp;of&nbsp;structure&nbsp;for&nbsp;a&nbsp;free&nbsp;block</span><span style="color: #008000;"><br></span><span style="color: #000000;">&nbsp;&nbsp;uint64_t&nbsp;off;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000;">//</span><span style="color: #008000;">&nbsp;offset&nbsp;of&nbsp;the&nbsp;block</span><span style="color: #008000;"><br></span><span style="color: #000000;">&nbsp;&nbsp;uint32_t&nbsp;rsiz;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000;">//</span><span style="color: #008000;">&nbsp;size&nbsp;of&nbsp;the&nbsp;block</span><span style="color: #008000;"><br></span><span style="color: #000000;">}&nbsp;HDBFB;&nbsp;</span></div>
<br>很明显，仅有两个成员，一个存放的是在数据库文件中的偏移量，一个则是该free block的尺寸。free pool数组用于保存那些被删除的记录信息，以便于回收利用这些数据区，后续会针对free pool相关的操作，API做一个详细的分析。<br><br>4）record数据区<br>每个record数据区的结构如下表：<br>
<table summary="record format">
    <tbody>
        <tr>
            <td class="label">name</td>
            <td class="label">offset</td>
            <td class="label">length</td>
            <td class="label">feature</td>
        </tr>
        <tr>
            <td>magic number</td>
            <td class="number">0</td>
            <td class="number">1</td>
            <td>identification of record block. always 0xC8</td>
        </tr>
        <tr>
            <td>hash value</td>
            <td class="number">1</td>
            <td class="number">1</td>
            <td>the hash value to decide the path of the hash chain</td>
        </tr>
        <tr>
            <td>left chain</td>
            <td class="number">2</td>
            <td class="number">4</td>
            <td>the alignment quotient of the destination of the left chain</td>
        </tr>
        <tr>
            <td>right chain</td>
            <td class="number">6</td>
            <td class="number">4</td>
            <td>the alignment quotient of the destination of the right chain</td>
        </tr>
        <tr>
            <td>padding size</td>
            <td class="number">10</td>
            <td class="number">2</td>
            <td>the size of the padding</td>
        </tr>
        <tr>
            <td>key size</td>
            <td class="number">12</td>
            <td class="number">vary</td>
            <td>the size of the key</td>
        </tr>
        <tr>
            <td>value size</td>
            <td class="number">vary</td>
            <td class="number">vary</td>
            <td>the size of the value</td>
        </tr>
        <tr>
            <td>key</td>
            <td class="number">vary</td>
            <td class="number">vary</td>
            <td>the data of the key</td>
        </tr>
        <tr>
            <td>value</td>
            <td class="number">vary</td>
            <td class="number">vary</td>
            <td>the data of the value</td>
        </tr>
        <tr>
            <td>padding</td>
            <td class="number">vary</td>
            <td class="number">vary</td>
            <td>useless data</td>
        </tr>
    </tbody>
</table>
<br>当然，上面这个结构只是该record被使用时的结构图，当某一项record被删除时，它的结构就变为：<br>
<table summary="free block format">
    <tbody>
        <tr>
            <td class="label">name</td>
            <td class="label">offset</td>
            <td class="label">length</td>
            <td class="label">feature</td>
        </tr>
        <tr>
            <td>magic number</td>
            <td class="number">0</td>
            <td class="number">1</td>
            <td>identification of record block. always 0xB0</td>
        </tr>
        <tr>
            <td>block size</td>
            <td class="number">1</td>
            <td class="number">4</td>
            <td>size of the block</td>
        </tr>
    </tbody>
</table>
&nbsp;&nbsp;&nbsp; <br>对比两种情况，首先是最开始的magic number是不同的，当magic number是0XB0也就是该record是已经被删除的free record时，那么紧跟着的4个字节存放的就是这个free record的尺寸，而record后面的部分可以忽略不计了。<br><br>分析完了hash数据库文件的几个组成部分，从最开始的数据库文件示意图中还看到，从文件头到bucket array这一部分将通过mmap映射到系统的共享内存中，当然，可以映射的内容可能不止到这里，但是，数据库文件头+bucket array这两部分是一定要映射到共享内存中的，也就是说，hash数据库中映射到共享内存中的内容上限没有限制，但是下限是文件头+bucket array部分。<br><br>同时，free pool也会通过malloc分配一个堆上的内存，存放到TCHDB的fbpool指针中。<br><br>这几部分（除了record zone），通过不同的方式都分别的读取到内存中，目的就是为了加快查找的速度，后面会详细的进行说明。<br><br><br>&nbsp;<br>      <img src ="http://www.cppblog.com/converse/aggbug/105317.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/converse/" target="_blank">那谁</a> 2010-01-10 10:22 <a href="http://www.cppblog.com/converse/archive/2010/01/10/105317.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>linux内核V2.6.11学习笔记(6)--中断处理</title><link>http://www.cppblog.com/converse/archive/2009/05/03/81773.html</link><dc:creator>那谁</dc:creator><author>那谁</author><pubDate>Sun, 03 May 2009 08:09:00 GMT</pubDate><guid>http://www.cppblog.com/converse/archive/2009/05/03/81773.html</guid><wfw:comment>http://www.cppblog.com/converse/comments/81773.html</wfw:comment><comments>http://www.cppblog.com/converse/archive/2009/05/03/81773.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.cppblog.com/converse/comments/commentRss/81773.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/converse/services/trackbacks/81773.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: &nbsp;&nbsp;<a href='http://www.cppblog.com/converse/archive/2009/05/03/81773.html'>阅读全文</a><img src ="http://www.cppblog.com/converse/aggbug/81773.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/converse/" target="_blank">那谁</a> 2009-05-03 16:09 <a href="http://www.cppblog.com/converse/archive/2009/05/03/81773.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>linux内核V2.6.11学习笔记(5)--异常处理</title><link>http://www.cppblog.com/converse/archive/2009/04/29/81496.html</link><dc:creator>那谁</dc:creator><author>那谁</author><pubDate>Wed, 29 Apr 2009 13:45:00 GMT</pubDate><guid>http://www.cppblog.com/converse/archive/2009/04/29/81496.html</guid><wfw:comment>http://www.cppblog.com/converse/comments/81496.html</wfw:comment><comments>http://www.cppblog.com/converse/archive/2009/04/29/81496.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/converse/comments/commentRss/81496.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/converse/services/trackbacks/81496.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: &nbsp;&nbsp;<a href='http://www.cppblog.com/converse/archive/2009/04/29/81496.html'>阅读全文</a><img src ="http://www.cppblog.com/converse/aggbug/81496.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/converse/" target="_blank">那谁</a> 2009-04-29 21:45 <a href="http://www.cppblog.com/converse/archive/2009/04/29/81496.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>linux内核V2.6.11学习笔记(4)--中断与异常处理概述</title><link>http://www.cppblog.com/converse/archive/2009/04/28/81381.html</link><dc:creator>那谁</dc:creator><author>那谁</author><pubDate>Tue, 28 Apr 2009 15:28:00 GMT</pubDate><guid>http://www.cppblog.com/converse/archive/2009/04/28/81381.html</guid><wfw:comment>http://www.cppblog.com/converse/comments/81381.html</wfw:comment><comments>http://www.cppblog.com/converse/archive/2009/04/28/81381.html#Feedback</comments><slash:comments>3</slash:comments><wfw:commentRss>http://www.cppblog.com/converse/comments/commentRss/81381.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/converse/services/trackbacks/81381.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: &nbsp;&nbsp;<a href='http://www.cppblog.com/converse/archive/2009/04/28/81381.html'>阅读全文</a><img src ="http://www.cppblog.com/converse/aggbug/81381.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/converse/" target="_blank">那谁</a> 2009-04-28 23:28 <a href="http://www.cppblog.com/converse/archive/2009/04/28/81381.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>linux内核V2.6.11学习笔记(3)--switch_to宏</title><link>http://www.cppblog.com/converse/archive/2009/04/19/80421.html</link><dc:creator>那谁</dc:creator><author>那谁</author><pubDate>Sun, 19 Apr 2009 02:16:00 GMT</pubDate><guid>http://www.cppblog.com/converse/archive/2009/04/19/80421.html</guid><wfw:comment>http://www.cppblog.com/converse/comments/80421.html</wfw:comment><comments>http://www.cppblog.com/converse/archive/2009/04/19/80421.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.cppblog.com/converse/comments/commentRss/80421.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/converse/services/trackbacks/80421.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: &nbsp;&nbsp;<a href='http://www.cppblog.com/converse/archive/2009/04/19/80421.html'>阅读全文</a><img src ="http://www.cppblog.com/converse/aggbug/80421.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/converse/" target="_blank">那谁</a> 2009-04-19 10:16 <a href="http://www.cppblog.com/converse/archive/2009/04/19/80421.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>linux内核V2.6.11学习笔记(2)--list和hlist</title><link>http://www.cppblog.com/converse/archive/2009/04/11/79572.html</link><dc:creator>那谁</dc:creator><author>那谁</author><pubDate>Sat, 11 Apr 2009 02:47:00 GMT</pubDate><guid>http://www.cppblog.com/converse/archive/2009/04/11/79572.html</guid><wfw:comment>http://www.cppblog.com/converse/comments/79572.html</wfw:comment><comments>http://www.cppblog.com/converse/archive/2009/04/11/79572.html#Feedback</comments><slash:comments>8</slash:comments><wfw:commentRss>http://www.cppblog.com/converse/comments/commentRss/79572.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/converse/services/trackbacks/79572.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: &nbsp;&nbsp;<a href='http://www.cppblog.com/converse/archive/2009/04/11/79572.html'>阅读全文</a><img src ="http://www.cppblog.com/converse/aggbug/79572.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/converse/" target="_blank">那谁</a> 2009-04-11 10:47 <a href="http://www.cppblog.com/converse/archive/2009/04/11/79572.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>linux内核V2.6.11学习笔记(1)--pid位图</title><link>http://www.cppblog.com/converse/archive/2009/04/10/79488.html</link><dc:creator>那谁</dc:creator><author>那谁</author><pubDate>Fri, 10 Apr 2009 04:57:00 GMT</pubDate><guid>http://www.cppblog.com/converse/archive/2009/04/10/79488.html</guid><wfw:comment>http://www.cppblog.com/converse/comments/79488.html</wfw:comment><comments>http://www.cppblog.com/converse/archive/2009/04/10/79488.html#Feedback</comments><slash:comments>6</slash:comments><wfw:commentRss>http://www.cppblog.com/converse/comments/commentRss/79488.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/converse/services/trackbacks/79488.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: &nbsp;&nbsp;<a href='http://www.cppblog.com/converse/archive/2009/04/10/79488.html'>阅读全文</a><img src ="http://www.cppblog.com/converse/aggbug/79488.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/converse/" target="_blank">那谁</a> 2009-04-10 12:57 <a href="http://www.cppblog.com/converse/archive/2009/04/10/79488.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>