Prayer

深入讨论awk
1.awk不仅是一个工具，它是一种语言。其涉及的条件操作符有：
<,<=,>,>=,==,!=,!~（不匹配正则表达式）, ~ (匹配正则表达式)
2. 逻辑操作符有：&& (and), || (or), ! (not)
3.awk的内置变量
ARGC 命令行参数个数
ARGV 命令行参数排列
ENVIRON 支持队列中系统环境变量的使用
FILENAME awk浏览的文件名
FNR 浏览的文件名
FS 设置输入域分隔符，等价于命令行-F选项
NF 浏览记录的域个数
NR 已读的记录数
OFS 输出域分隔符
ORS 输出记录分隔符
RS 控制记录分隔符
awk -F '#' '{print NF,NR,$0,ARGV[1]，ENVIRON["USER"]}' test.txt 以域为分隔符，打印域的个数，已读的记录数，整行信息，第一个参数（0的话是awk）及用户（最后一个是取得系统的环境变量）
4.awk的字符串函数
gsub(r,s) 在整个$0中用s替代r
gsub(r,s,t) 在整个t中用s替代r
index(s,t) 返回s中字符串t的第一个位置
length(s) 返回s的长度
match(s,r) 测试s是否包含匹配的r的字符串
split(s,a,fs) 用fs（分割符）将s分成序列a
sprint(fmt,exp) 返回fmt格式化后的exp
sub(r,s) 用$0中最左边最长的字串代替s
substr(s,p) 返回字符串s中从p开始的部分
substr(s,p,n) 返回字符串s中从p开始长度为n的部分
example: awk -F '#' '{if (gsub("s","S",$1)) print $1}' test.txt 以#为分割符，用"S"代替test.txt中每行第一个域中的"s"，并打印第一个域
awk -F '#' '{print (index($1,"s"))}' text.txt 打印“s”在每一行第一个域中的位置，若为0表示没有这个字符
5.awk中的转义字符
\b 退格键
\t tab键
\f 走纸换页
\ddd 八进制值
\n 新行
\c 任意其他特殊字符，如\\为反斜杠符号
\r 回车键
example: awk -F '#' '{print (index($2,'s')), "\t",$2}' test.txt 打印"s"在第二个域中的位置、退格(相当于键入tab键)、第二个域的内容
6.printf修饰符
%c ASCII字符
%d 整数
%f 浮点数
%e 浮点数，科学记数法
%g awk决定使用哪种浮点数转换，e或者f
%o 八进制数
%s 字符串
%x 十六进制数
example: awk -F '#' '{printf "%c\n",$1}' test.txt 打印test.txt文件中的第一个域的ASCII码（注意"%c\n"与"$1"之间的"," 是不可以忽略的）
7.awk数组
awk中数组叫做关联数组(associative arrays)，因为下标记可以是数也
可以是串。awk中的数组不必提前声明，也不必声明大小。数组元素用0或
空串来初始化，这根据上下文而定。
awk 'BEGIN {print split("as#qw#1234",array2,"#")}' 表示以"#"为分割符，将"as#qw#1234"分割到array2数组中，并打印数组的长度
awk 'BEGIN {split("as#qw#1234",array2,"#"); print array[1]}' 如上，并打印该数组的第一个元素。（注意：这里的是数组是从下标1开始的）
8.举例分析：
（1） awk '{if ($1~/^21[0-9]/) print $0}' test.txt |wc -l 匹配test.txt中第一个域为210-219开头的行，并通过管道计算其数量。
（2）awk '{if ($4~/^\[07\/Jul\/2004/) print $0 }' test.txt | awk '{if ($7=="/htm/free_call.php") print $0} ' |wc -l 匹配test.txt中第四个域以 "[07/Jul/2004" 开头的且第7个域为"/htm/free_call.php"的所有行，并计算其数量。该语句本来的意思是：统计2004年7月24日访问 /htm/free_call.php的次数。
(3)以下是awk_array.sh的文件内容
#!/bin/awk -f #注明是awk的语法，若无此行则按bash的语法编译会出错
#awk_array.sh
BEGIN{ #BEGIN模式中的命令
FS="#"
score["0-60"]=0 #score数组索引为"0-60"的元素（awk语法允许用字符串索引）
score["60-70"]=0
score["70-80"]=0
score["80-90"]=0
score["90-100"]=0
student["junior"]=0
student["senior"]=0
}
{
{ if ($1<60) #如果第一个域的值小于60
score["0-60"]++ #score数组中索引为"0-60"的元素值+1
}
{ if($1>=60 && $1<70)
score["60-70"]++
}
{ if($1>=70 && $1<80)
score["70-80"]++
}
{ if($1>=80 && $1<90)
score["80-90"]++
}
{ if($1>=90&&$1<=100)
score["90-100"]++
}
}
{ #另senior_junior依次为student数组中的索引（有几个就循环几次）
for (senior_junior in student)
{if ($2==senior_junior)
student[senior_junior]++
}
}
END{
{ for (number in score) print "the score",number,"has",score[number],"students"}
{ for (senior_junior in student) print "The class has ",student[senior_junior],senior_junior,"students" }
}
若有文件grade.txt如下：
85#senior
87#junior
78#junior
69#senior
56#junior
98#senior
83#senior
输入命令./awk.sh grade.txt 则输出如下：
the score 0-60 has 1 students
the score 70-80 has 1 students
the score 90-100 has 1 students
the score 60-70 has 1 students
the score 80-90 has 3 students
The class has 4 senior students
The class has 3 junior students

Prayer

awk 内置的函数和命令

日历

常用链接

留言簿(28)

随笔分类

随笔档案

文章分类

UNIX

信用卡

搜索

最新评论

阅读排行榜

评论排行榜