woaidongmao

文章均收录自他人博客,但不喜标题前加-[转贴],因其丑陋,见谅!~
随笔 - 1469, 文章 - 0, 评论 - 661, 引用 - 0
数据加载中……

正则表达式速度测试(Regular Expression Performance Comparison)

The following tables provide comparisons between the following regular expression libraries:

GRETA.

The Boost regex library.

Henry Spencer's regular expression library - this is provided for comparison as a typical non-backtracking implementation.

Philip Hazel's PCRE library.

Details

Machine: Intel Pentium E2160 1.8GHz PC.

Compiler: Microsoft Visual C++ version 7.1.

C++ Standard Library: Dinkumware standard library version 313.

OS: Win32.

Boost version: 1.45.0.

PCRE version: 8.10.

As ever care should be taken in interpreting the results, only sensible regular expressions (rather than pathological cases) are given, most are taken from the Boost regex examples, or from the Library of Regular Expressions. In addition, some variation in the relative performance of these libraries can be expected on other machines - as memory access and processor caching effects can be quite large for most finite state machine algorithms.

Averages

The following are the average relative scores for all the tests: the perfect regular expression library would score 1, in practice anything less than 2 is pretty good.

GRETA GRETA
(non-recursive mode)
Boost Boost + C++ locale PCRE Dynamic Xpressive
2.0308 5.24257 1.72796 1.90946 1.78887 2.78239

Comparison 1: Long Search

For each of the following regular expressions the time taken to find all occurrences of the expression within a long English language text was measured (mtent12.txt from Project Gutenberg, 19Mb).

Expression GRETA GRETA
(non-recursive mode)
Boost Boost + C++ locale PCRE Dynamic Xpressive
Twain 1
(0.0249s)
1
(0.0249s)
2.74
(0.0683s)
2.75
(0.0684s)
1.1
(0.0273s)
1.02
(0.0254s)
Huck[[:alpha:]]+ 1
(0.0239s)
1.02
(0.0244s)
2.78
(0.0664s)
2.78
(0.0665s)
1.06
(0.0254s)
1.02
(0.0244s)
[[:alpha:]]+ing 4.37
(2.19s)
9.94
(4.97s)
1
(0.5s)
1.03
(0.515s)
5.16
(2.58s)
2.19
(1.09s)
^[^ ]*?Twain 4.63
(0.796s)
13.9
(2.39s)
1
(0.172s)
1.02
(0.176s)
3.09
(0.531s)
2.36
(0.406s)
Tom|Sawyer|Huckleberry|Finn 4.92
(0.274s)
15.4
(0.859s)
1.37
(0.0761s)
1.4
(0.0781s)
1.03
(0.0576s)
1
(0.0556s)
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) 2.56
(0.359s)
7
(0.984s)
1
(0.141s)
1
(0.141s)
1.75
(0.246s)
1.28
(0.179s)

Comparison 2: Medium Sized Search

For each of the following regular expressions the time taken to find all occurrences of the expression within a medium sized English language text was measured (the first 50K from mtent12.txt - up to the end of Chapter 1).

Expression GRETA GRETA
(non-recursive mode)
Boost Boost + C++ locale PCRE Dynamic Xpressive
Twain 1
(5.91e-005s)
1.03
(6.1e-005s)
3.81
(0.000225s)
3.87
(0.000229s)
2.45
(0.000145s)
1.23
(7.24e-005s)
Huck[[:alpha:]]+ 1
(5.91e-005s)
1.02
(6.01e-005s)
3.29
(0.000194s)
3.29
(0.000195s)
1.71
(0.000101s)
1.1
(6.48e-005s)
[[:alpha:]]+ing 5.19
(0.00586s)
11.7
(0.0132s)
1
(0.00113s)
1.03
(0.00116s)
5.95
(0.00672s)
2.38
(0.00268s)
^[^ ]*?Twain 4.39
(0.00207s)
13.2
(0.00622s)
1
(0.000473s)
1
(0.000473s)
3.03
(0.00143s)
2.32
(0.0011s)
Tom|Sawyer|Huckleberry|Finn 2.27
(0.000899s)
6.61
(0.00262s)
1
(0.000396s)
1.04
(0.000412s)
1.23
(0.000488s)
1.27
(0.000504s)
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) 1.82
(0.00125s)
4.62
(0.00317s)
1
(0.000687s)
1.02
(0.000701s)
1.47
(0.00101s)
1.31
(0.0009s)

Comparison 3: C++ Code Search

For each of the following regular expressions the time taken to find all occurrences of the expression within the C++ source file boost/crc.hpp was measured.

Expression GRETA GRETA
(non-recursive mode)
Boost Boost + C++ locale PCRE Dynamic Xpressive
^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([ ]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{) 6.07
(0.000717s)
27.9
(0.00329s)
1
(0.000118s)
1.03
(0.000122s)
3.23
(0.000381s)
1.81
(0.000214s)
(^[ ]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\> 1
(0.00275s)
2.84
(0.00781s)
1.38
(0.00378s)
1.38
(0.00378s)
3.2
(0.00878s)
NA
^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>) 3.84
(0.000747s)
16.6
(0.00323s)
1.02
(0.000198s)
1
(0.000195s)
1.92
(0.000374s)
1.33
(0.000259s)
^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>) 3.84
(0.000747s)
16.6
(0.00323s)
1
(0.000195s)
1.02
(0.000198s)
1.92
(0.000374s)
1.33
(0.000259s)

Comparison 4: HTML Document Search

For each of the following regular expressions the time taken to find all occurrences of the expression within the html file libs/libraries.htm was measured.

Expression GRETA GRETA
(non-recursive mode)
Boost Boost + C++ locale PCRE Dynamic Xpressive
beman|john|dave 3.15
(0.000915s)
8.42
(0.00244s)
1
(0.00029s)
1.42
(0.000412s)
1.26
(0.000366s)
7.57
(0.0022s)
<p>.*?</p> 1
(8.96e-005s)
1.06
(9.53e-005s)
2.38
(0.000214s)
3.15
(0.000282s)
2.22
(0.000198s)
10.6
(0.000946s)
<a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*> 1.17
(0.000533s)
1.63
(0.000747s)
1
(0.000457s)
2.4
(0.0011s)
1.07
(0.000488s)
5.34
(0.00244s)
<h[12345678][^>]*>.*?</h[12345678]> 1
(0.00016s)
1.09
(0.000175s)
1.38
(0.000221s)
1.81
(0.00029s)
1.29
(0.000206s)
8.57
(0.00137s)
<img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*> 1
(7.43e-005s)
1.03
(7.63e-005s)
3.28
(0.000244s)
4
(0.000297s)
2.56
(0.000191s)
9.85
(0.000732s)
<font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font> 1
(6.86e-005s)
1.03
(7.06e-005s)
3.56
(0.000244s)
4.33
(0.000297s)
2.67
(0.000183s)
9.11
(0.000625s)

Comparison 3: Simple Matches

For each of the following regular expressions the time taken to match against the text indicated was measured.

Expression Text GRETA GRETA
(non-recursive mode)
Boost Boost + C++ locale PCRE Dynamic Xpressive
abc abc 1.37
(2.09e-007s)
1.9
(2.9e-007s)
2.15
(3.28e-007s)
2.29
(3.5e-007s)
1
(1.53e-007s)
1.81
(2.76e-007s)
^([0-9]+)(\-| |$)(.*)$ 100- this is a line of ftp response which contains a message string 1.3
(5.21e-007s)
2.19
(8.79e-007s)
1.52
(6.1e-007s)
1.63
(6.56e-007s)
1
(4.02e-007s)
1.44
(5.81e-007s)
([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} 1234-5678-1234-456 1.46
(6.4e-007s)
1.97
(8.64e-007s)
2.03
(8.94e-007s)
2.1
(9.23e-007s)
1
(4.39e-007s)
2.03
(8.94e-007s)
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ john@johnmaddock.co.uk 1.12
(1.16e-006s)
1.63
(1.7e-006s)
1.49
(1.55e-006s)
1.54
(1.61e-006s)
1
(1.04e-006s)
1.46
(1.52e-006s)
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ foo12@foo.edu 1.09
(9.54e-007s)
1.7
(1.49e-006s)
1.46
(1.28e-006s)
1.56
(1.37e-006s)
1
(8.78e-007s)
1.46
(1.28e-006s)
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ bob.smith@foo.tv 1.1
(9.82e-007s)
1.7
(1.52e-006s)
1.43
(1.28e-006s)
1.5
(1.34e-006s)
1
(8.94e-007s)
1.43
(1.28e-006s)
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ EH10 2QQ 1.11
(3.06e-007s)
1.67
(4.62e-007s)
1.78
(4.91e-007s)
2
(5.51e-007s)
1
(2.76e-007s)
1.73
(4.77e-007s)
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ G1 1AA 1.05
(2.9e-007s)
1.67
(4.62e-007s)
1.78
(4.91e-007s)
1.94
(5.36e-007s)
1
(2.76e-007s)
1.73
(4.77e-007s)
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ SW1 1ZZ 1
(2.76e-007s)
1.62
(4.47e-007s)
1.79
(4.92e-007s)
1.94
(5.36e-007s)
1
(2.76e-007s)
1.73
(4.77e-007s)
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ 4/1/2001 1.2
(3.13e-007s)
1.63
(4.24e-007s)
1.69
(4.4e-007s)
1.77
(4.62e-007s)
1
(2.6e-007s)
1.95
(5.06e-007s)
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ 12/12/2001 1.11
(2.9e-007s)
1.57
(4.1e-007s)
1.71
(4.47e-007s)
1.88
(4.91e-007s)
1
(2.61e-007s)
1.88
(4.91e-007s)
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ 123 1
(2.53e-007s)
1.5
(3.8e-007s)
1.82
(4.62e-007s)
1.88
(4.77e-007s)
1.03
(2.6e-007s)
1.59
(4.02e-007s)
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ +3.14159 1
(2.76e-007s)
1.67
(4.62e-007s)
1.78
(4.91e-007s)
1.95
(5.37e-007s)
1
(2.76e-007s)
1.6
(4.4e-007s)
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ -3.14159 1
(2.76e-007s)
1.67
(4.62e-007s)
1.78
(4.91e-007s)
1.95
(5.37e-007s)
1
(2.76e-007s)
1.6
(4.4e-007s)


© Copyright John Maddock 2003

Use, modification and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

posted on 2011-07-09 12:21 肥仔 阅读(1023) 评论(0)  编辑 收藏 引用 所属分类: 正则表达式


只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理