eSNACC对长度的编码和解码

本文剖析asn-len.h/c，从源代码来学习eSNACC对长度的编码和解码。

在正式引出源代码之前，我觉得非常有必要强调几点非常重要的知识：

1、eSNACC编译器对数据的编码设计是反序的，也就是先编码数据并写进缓冲区，以此而知道了编码好的数据长度，然后再将本长度值编码插到缓冲区前面。这样设计的目的是减少性能的损失。而很多其他编译器是开一个临时缓冲区来完成这个工作，这就带来了性能的损失。详细的说明，请参加eSNACC文档。我们要记住的是eSNACC编码时反序的。

2、eSNACC既支持确定长度编码也支持不确定长度编码。原理是：确定长度编码，那么在数据之前的若干字节来表面后面数据的长度；若为不确定长度编码，那么数据前面的长度字节为0x80，此代表长度不确定，然后在数据最后用EOC（End-Of-Contents）来表示数据的结束。

3、eSNACC支持BER和DER编码。不过BER允许不确定长度，但是DER只支持确定长度，所以在他们的编码解码函数上有所不同。

4、如果感觉代码中的PROTO和PARAMS很陌生，请读本系列中的《关于老式函数声明》一文。

以下具体结合源码分析。

typedef unsigned long AsnLen;

* BER Encoding/Decoding routines

/* max unsigned value - used for internal rep of indef len */

#define INDEFINITE_LEN ~0L

以上说明，eSNACC长度用AsnLen类型来定义，而该类型也就是unsigned long。而不确定长度的标记就是INDEFINITE_LEN。

#ifdef USE_INDEF_LEN

#define BEncEocIfNec( b) BEncEoc (b)

* include len for EOC (2 must be first due to BEncIndefLen

* - ack! ugly macros!)

#define BEncConsLen( b, len) 2 + BEncIndefLen(b)

#else /* use definite length - faster?/smaller encodings */

/* do nothing since only using definite lens */

#define BEncEocIfNec( b)

#define BEncConsLen( b, len) BEncDefLen(b, len)

#endif

以上用ifdef来定义了BER编码确定长度和不确定长度的2个宏：

BEncEocIfNec代表BER中编码EOC标记，从代码可见，若为不确定长度，那么就用BEncEoc来完成；若为确定长度，因为根本就不需要EOC，所以什么都不要做。

BEncConsLen代表BER中编码内容的长度，代码说明了，若为不确定长度，出了调用BEncIndefLen来编码内容长度，因为内容之后还需要2个字节表示EOC，所以再前面加了2；若为确定长度，因为不需要EOC，所以就直接调用BEncDefLen来完成。

* writes indefinite length byte to buffer. 'returns' encoded len (1)

#define BEncIndefLen( b)\

1;\

BufPutByteRvs (b, 0x80);

#ifndef _DEBUG

#define BEncEoc( b)\

2;\

BufPutByteRvs (b, 0);\

BufPutByteRvs (b, 0);

#endif

这一段说明了编码不确定长度和编码EOC的两个宏，我们会发现这两个宏很奇怪，因为与我们以前常见的不一样，我当时看就感觉好像语法错了一样。是的，这些宏就是这样设计的，我们先看看源文件开头作者写的说明：

* Warning: many of these routines are MACROs for performance reasons

* - be carful where you use them. Don't use more than one per

* assignment statement -

* (eg itemLen += BEncEoc (b) + BEncFoo (b) ..; this

* will break the code)

* include len for EOC (2 must be first due to BEncIndefLen

* - ack! ugly macros!)

看到了，作者深知这些宏的丑陋！也深知这些宏的弊端！他们不能像调用函数或变量那样连接使用，否则会破坏代码！这一切，嗨，都是为了性能，无所不用其极呀！

好吧，引入上面这些就是提醒大家在用这些宏时要高度警惕。下面分析这些宏：

首先，这些宏要编码长度。其次，他还要返回编码好的长度的值。就是因为要实现这两个功能，才被迫写成这样：

这两个宏的第一个分号前的数值就是返回值。

而后面的就是把编码的值压到缓冲区里面去，实现真正的编码。

就是这么简单了。

///***************************************休息一下*************************

为了更好的说明.h文件的后面一点点，下面我们先跳到.c中分析具体的函数实现：

* BER encode/decode routines

AsnLen

BEncDefLen PARAMS ((b, len),

GenBuf *b _AND_

AsnLen len)

{

* unrolled for efficiency

* check each possibitlity of the 4 byte integer

if (len < 128)

{

BufPutByteRvs (b, (unsigned char)len);

return 1;

}

else if (len < 256)

{

BufPutByteRvs (b, (unsigned char)len);

BufPutByteRvs (b, 0x81);

return 2;

}

else if (len < 65536)

{

BufPutByteRvs (b, (unsigned char)len);

BufPutByteRvs (b, (unsigned char)(len >> 8));

BufPutByteRvs (b, 0x82);

return 3;

}

else if (len < 16777126)

{

BufPutByteRvs (b, (unsigned char)len);

BufPutByteRvs (b, (unsigned char)(len >> 8));

BufPutByteRvs (b, (unsigned char)(len >> 16));

BufPutByteRvs (b, 0x83);

return 4;

}

else

{

BufPutByteRvs (b, (unsigned char)len);

BufPutByteRvs (b, (unsigned char)(len >> 8));

BufPutByteRvs (b, (unsigned char)(len >> 16));

BufPutByteRvs (b, (unsigned char)(len >> 24));

BufPutByteRvs (b, 0x84);

return 5;

}

} /* BEncDefLen */

仔细理解，我们发现本函数就是做了这样一件事情：把长度值的有效字节压到缓冲区，然后把有效字节的值压到缓冲区，最后返回编码的长度的字节数。

而这一切，都是通过BufPutByteRvs完成。之所以这样，是因为长度值本身是用一个AsnLen(也就是unsigned long)来表示的，这用了4个字节。如果长度值小，比如小于128，仅仅一个字节表示就够了，所以压缩一下而已。

下面我们看解码：

* decodes and returns an ASN.1 length

AsnLen

BDecLen PARAMS ((b, bytesDecoded, env),

GenBuf *b _AND_

unsigned long *bytesDecoded _AND_

jmp_buf env)

{

AsnLen len;

AsnLen byte;

int lenBytes;

byte = (unsigned long) BufGetByte (b);

if (BufReadError (b))

{

Asn1Error ("BDecLen: ERROR - decoded past end of data\n");

longjmp (env, -13);

}

(*bytesDecoded)++;

if (byte < 128) /* short length */

return byte;

else if (byte == (AsnLen) 0x080) /* indef len indicator */

return (unsigned long)INDEFINITE_LEN;

else /* long len form */

{

* strip high bit to get # bytes left in len

lenBytes = byte & (AsnLen) 0x7f;

if (lenBytes > sizeof (AsnLen))

{

Asn1Error ("BDecLen: ERROR - length overflow\n");

longjmp (env, -14);

}

(*bytesDecoded) += lenBytes;

for (len = 0; lenBytes > 0; lenBytes--)

len = (len << 8) | (AsnLen) BufGetByte (b);

if (BufReadError (b))

{

Asn1Error ("BDecLen: ERROR - decoded past end of data\n");

longjmp (env, -15);

}

return len;

}

/* not reached */

} /* BDecLen */

首先用BufGetByte从缓冲区读取第一个字节，这读出来的是什么呢？是关于长度吗？当然是的了！但是细心的你就会说：在前面的编码函数BEncDefLen中长度不是最后才被压进缓冲区的吗？哈哈，这个就得想起本文开始提到的第一条：反序编码。当然实现是在BufPutByteRvs里面，这个以后再讲，不过我想我这样一说，大家也已经明白了。是吗？

然后判断读出来的值，如果小于128，那很好，长度就是他了！而如果为0x80，那就是不确定长度了（最上面第2条）。其他的情况，那么这个值就是长度的有效字节数，这样就很简单了，依次读取解析就可以了。

当然，本函数内部有两种情况会进行出错处理，此处就不展开了。

#ifdef _DEBUG

AsnLen

BEncEoc PARAMS ((b),

GenBuf *b)

{

BufPutByteRvs (b, 0);

return 2;

} /* BEncEoc */

#endif

* Decodes an End of Contents (EOC) marker from the given buffer.

* Flags and error if the octets are non-zero or if a read error

* occurs. Increments bytesDecoded by the length of the EOC marker.

void

BDecEoc PARAMS ((b, bytesDecoded, env),

GenBuf *b _AND_

AsnLen *bytesDecoded _AND_

jmp_buf env)

{

if ((BufGetByte (b) != 0) || (BufGetByte (b) != 0) || BufReadError (b))

{

Asn1Error ("BDecEoc: ERROR - non zero byte in EOC or end of data reached\n");

longjmp (env, -16);

}

(*bytesDecoded) += 2;

} /* BDecEoc */

上面的代码说明了对EOC的编码和解码实现，可以发现EOC就是以连续的两个全0字节表示的。

实现文件的最后还有这样一个函数：

* decodes and returns a DER encoded ASN.1 length

AsnLen

DDecLen PARAMS ((b, bytesDecoded, env),

GenBuf *b _AND_

unsigned long *bytesDecoded _AND_

jmp_buf env)

{

AsnLen len;

AsnLen byte;

int lenBytes;

byte = (AsnLen) BufGetByte (b);

if (BufReadError (b))

{

Asn1Error ("DDecLen: ERROR - decoded past end of data\n");

longjmp (env, -13);

}

(*bytesDecoded)++;

if (byte < 128) /* short length */

return byte;

else if (byte == (AsnLen) 0x080) {/* indef len indicator */

Asn1Error("DDecLen: ERROR - Indefinite length decoded");

longjmp(env, -666);

}

else /* long len form */

{

* strip high bit to get # bytes left in len

lenBytes = byte & (AsnLen) 0x7f;

if (lenBytes > sizeof (AsnLen))

{

Asn1Error ("DDecLen: ERROR - length overflow\n");

longjmp (env, -14);

}

(*bytesDecoded) += lenBytes;

for (len = 0; lenBytes > 0; lenBytes--)

len = (len << 8) | (AsnLen) BufGetByte (b);

if (BufReadError (b))

{

Asn1Error ("DDecLen: ERROR - decoded past end of data\n");

longjmp (env, -15);

}

return len;

}

/* not reached */

} /* DDecLen */

这是对应DER编码的解码函数，其实现与BER编码基本一样，唯一不同的是因为DER编码不允许不确定长度，所以如果长度字节为0x80，那么人家就直接罢工了。

到此，编码解码的具体实现都明白了，那.h文件中还剩下什么了呢？除了上面函数的一些声明，我们还发现一样感兴趣的东西：

* use if you know the encoded length will be 0 >= len <= 127

* Eg for booleans, nulls, any resonable integers and reals

* NOTE: this particular Encode Routine does NOT return the length

* encoded (1).

#define BEncDefLenTo127( b, len)\

BufPutByteRvs (b, (unsigned char) len)

是的，这里有这样一个宏，在eSNACC文档中也专门提了。这其实也是为了效率考虑而加的：当你确定长度小于127时，就应当直接调用这个宏，而不要调用前面讲的编码函数。（其实我们可以看到编码函数做的就是同样的事情，除了多了一些判断，当然关键是省却函数调用的过程，又见无所不用其极。）

注意：这个宏只做了编码，而没有返回长度（也就是1个字节）！

好了，eSNACC支持BER和DER编码解码，他们的一些声明也都类似，就不在敖述了。记住本文开始提到的第三点，其他代码也就一目了然了。

到此，asn-len.h/c的剖析胜利完成！

posted on 2012-04-20 11:00 Tim 阅读(1622) 评论(0) 编辑收藏引用所属分类: eSNACC学习

只有注册用户登录后才能发表评论。
【推荐】100%开源！大型工业跨平台软件C++源码提供，建模，组态！

相关文章: eSNACC的C运行时库动态内存管理剖析eSNACC的hash函数剖析eSNACC哈希结构的设计和实现 eSNACC对ASN.1 constructors的处理 eSNACC对OBJECT IDENTIFIER的编码和解码 eSNACC对ASN.1内置字符串的编码和解码 eSNACC对OCTET STRING 的编码和解码 eSNACC对BIT STRING的编码和解码 eSNACC对INTEGER的编码和解码 eSNACC对BOOLEAN的编码和解码

网站导航: 博客园 IT新闻 BlogJava 博问 Chat2DB 管理

无我

eSNACC对长度的编码和解码

导航

统计

公告

留言簿(9)

随笔分类(173)

IT

Life

搜索

积分与排名

最新随笔

最新评论

阅读排行榜