﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>C++博客-diceidea-随笔分类-Dev log</title><link>http://www.cppblog.com/diceidea/category/7046.html</link><description>parser</description><language>zh-cn</language><lastBuildDate>Wed, 03 Sep 2008 10:25:55 GMT</lastBuildDate><pubDate>Wed, 03 Sep 2008 10:25:55 GMT</pubDate><ttl>60</ttl><item><title>DFA和lexical analysis</title><link>http://www.cppblog.com/diceidea/archive/2008/05/24/50954.html</link><dc:creator>diceidea</dc:creator><author>diceidea</author><pubDate>Sat, 24 May 2008 05:59:00 GMT</pubDate><guid>http://www.cppblog.com/diceidea/archive/2008/05/24/50954.html</guid><wfw:comment>http://www.cppblog.com/diceidea/comments/50954.html</wfw:comment><comments>http://www.cppblog.com/diceidea/archive/2008/05/24/50954.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/diceidea/comments/commentRss/50954.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/diceidea/services/trackbacks/50954.html</trackback:ping><description><![CDATA[对于hand written的lexical analyzer来说，NFA和DFA的运用是不可避免的，除非你的grammer十分简单。<br>一旦给出了source program(也就是你想处理的character stream)的一个pattern的正则表达式，就可以构造对应的NFA，然后转换为DFA，这个DFA就可以用来处理你的source program, 将里面能够match这个pattern的lexeme全都找出来。按照这样的流程，对于一种编程语言，不管是常用的语言，还是脚本语言，只要对所有的pattern构造DFA，就能够写出自己的lexical analyzer了。<br>有两篇关于正则表达式到DFA的文章写的很好：<br><span id="ctl00_ArticleTopHeader_ArticleTitle" class="ArticleTopTitle">1.Writing own regular expression parser</span>
<strong>By <a href="http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=30261">Amer Gerzic英文的</a></strong><span id="ctl00_ArticleTopHeader_ArticleTitle" class="ArticleTopTitle"><br></span><a>http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx<br>有源码<br></a>2. <a href="http://www.cppblog.com/vczh/archive/2008/05/22/50763.html" id="viewpost1_TitleUrl" class="postTitle2">《构造正则表达式引擎》新鲜出炉啦！</a>中文的，by vczh,华南理工大学<br>http://www.cppblog.com/vczh/archive/2008/05/22/50763.html<br>阅读完上面两篇文章，写个能够运行的lexer就不成问题了。<br>另外附上龙书（Compilers, principles techniques and tools）里一段token,pattern和lexeme术语的区别：<br>1. A t o k e n&nbsp; is&nbsp; a&nbsp; pair&nbsp; consisting&nbsp; of&nbsp; a&nbsp; token&nbsp; name&nbsp; and&nbsp; an optional attribute <br>value.&nbsp;&nbsp; The&nbsp; token&nbsp; name&nbsp; is&nbsp; <span style="background-color: yellow;">an&nbsp; abstract</span>&nbsp; symbol&nbsp; representing&nbsp; a&nbsp; kind&nbsp; of <br><span style="background-color: yellow;">lexical unit</span><span style="background-color: yellow;">(lexeme)</span>, e.g., a&nbsp; particular keyword, or a sequence of&nbsp; input&nbsp; characters <br>denoting an identifier.&nbsp; The token&nbsp; names are the input&nbsp; symbols that the <br>parser&nbsp; processes.&nbsp; In what&nbsp; follows, we&nbsp; shall generally write the name of&nbsp; a <br>token&nbsp; in boldface.&nbsp; We&nbsp; will often refer to a token&nbsp; by&nbsp; its token name. <br>2. A pattern is a description of the form that the lexemes of&nbsp; a token may take. <br>In&nbsp; the case of&nbsp; a&nbsp; keyword as&nbsp; a token,&nbsp; the pattern&nbsp; is just&nbsp; the sequence of <br>characters that form the keyword.&nbsp; For identifiers and some other tokens, <br>the pattern is a more complex structure that is matched by many strings. <br>3. A lexeme is a sequence of&nbsp; characters in the source program that matches <br>the&nbsp; pattern&nbsp; for&nbsp; a&nbsp; token&nbsp; and&nbsp; is&nbsp; identified&nbsp; by&nbsp; the&nbsp; lexical&nbsp; analyzer&nbsp; as&nbsp; an <br><span style="background-color: yellow;">instance of&nbsp; that token. </span><br>&nbsp;notes: <br>1. more than&nbsp; one lexeme&nbsp; can&nbsp; match&nbsp; a&nbsp; pattern<br>2. 看看example 3.1<br><br><br> <img src ="http://www.cppblog.com/diceidea/aggbug/50954.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/diceidea/" target="_blank">diceidea</a> 2008-05-24 13:59 <a href="http://www.cppblog.com/diceidea/archive/2008/05/24/50954.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>dev log(1)</title><link>http://www.cppblog.com/diceidea/archive/2008/05/12/49623.html</link><dc:creator>diceidea</dc:creator><author>diceidea</author><pubDate>Mon, 12 May 2008 03:50:00 GMT</pubDate><guid>http://www.cppblog.com/diceidea/archive/2008/05/12/49623.html</guid><wfw:comment>http://www.cppblog.com/diceidea/comments/49623.html</wfw:comment><comments>http://www.cppblog.com/diceidea/archive/2008/05/12/49623.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.cppblog.com/diceidea/comments/commentRss/49623.html</wfw:commentRss><trackback:ping>http://www.cppblog.com/diceidea/services/trackbacks/49623.html</trackback:ping><description><![CDATA[<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">Summary for last
week:</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">1. Understand the
basic of lexical parse and syntax parse, know what they are used for
separately. The former with regular expression to recognize tokens, the latter
using operator precedence/recursive descent to build a syntax tree for
continuing step, e.g, computing the math expression. The former is considered
on character, while the latter is on token.</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">2. Have a
superficial knowledge in Automate, know how to build a FA for specific regular
expression and reduce e-FA to FA, if the result is NFA, then convert to DFA</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">3. I find rolling
dice can reduce to a SM.</p>
<p style="margin: 0in; font-family: calibri; font-size: 11pt;" lang="zh-CN">&nbsp;</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">Todo in coming week:</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">1. Study the
internal lexical and syntax parts of muParser.</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">2. Write code for my
own math expression parser.</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">3. Add 'If' and 'while' to
the parser using my own way, then read more in Parsing related Docs for better
solution.</p>
<p style="margin: 0in; font-family: calibri; font-size: 11pt;" lang="zh-CN">&nbsp;</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">Time stamp:</p>
<p style="margin: 0in; font-family: Calibri; font-size: 11pt;">1:00 May.12.2008</p><img src ="http://www.cppblog.com/diceidea/aggbug/49623.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.cppblog.com/diceidea/" target="_blank">diceidea</a> 2008-05-12 11:50 <a href="http://www.cppblog.com/diceidea/archive/2008/05/12/49623.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>