Bash Process Substitution

In addition to the fairly common forms of input/output redirection the shell recognizes something called process substitution. Although not documented as a form of input/output redirection, its syntax and its effects are similar.

The syntax for process substitution is:

  <(list)
or
  >(list)
where each list is a command or a pipeline of commands. The effect of process substitution is to make each list act like a file. This is done by giving the list a name in the file system and then substituting that name in the command line. The list is given a name either by connecting the list to named pipe or by using a file in /dev/fd (if supported by the O/S). By doing this, the command simply sees a file name and is unaware that its reading from or writing to a command pipeline.

 

To substitute a command pipeline for an input file the syntax is:

  command ... <(list) ...
To substitute a command pipeline for an output file the syntax is:
  command ... >(list) ...

 

At first process substitution may seem rather pointless, for example you might imagine something simple like:

  uniq <(sort a)
to sort a file and then find the unique lines in it, but this is more commonly (and more conveniently) written as:
  sort a | uniq
The power of process substitution comes when you have multiple command pipelines that you want to connect to a single command.

 

For example, given the two files:

  # cat a
  e
  d
  c
  b
  a
  # cat b
  g
  f
  e
  d
  c
  b
To view the lines unique to each of these two unsorted files you might do something like this:
  # sort a | uniq >tmp1
  # sort b | uniq >tmp2
  # comm -3 tmp1 tmp2
  a
        f
        g
  # rm tmp1 tmp2
With process substitution we can do all this with one line:
  # comm -3 <(sort a | uniq) <(sort b | uniq)
  a
        f
        g

 

Depending on your shell settings you may get an error message similar to:

  syntax error near unexpected token `('
when you try to use process substitution, particularly if you try to use it within a shell script. Process substitution is not a POSIX compliant feature and so it may have to be enabled via:
  set +o posix
Be careful not to try something like:
  if [[ $use_process_substitution -eq 1 ]]; then
    set +o posix
    comm -3 <(sort a | uniq) <(sort b | uniq)
  fi
The command set +o posix enables not only the execution of process substitution but the recognition of the syntax. So, in the example above the shell tries to parse the process substitution syntax before the "set" command is executed and therefore still sees the process substitution syntax as illegal.

 

Of course, note that all shells may not support process substitution, these examples will work with bash.


进程替换与命令替换很相似. 命令替换把一个命令的结果赋值给一个变量, 比如dir_contents=`ls -

al`或xref=$( grep word datafile). 进程替换把一个进程的输出提供给另一个进程(换句话说, 它把

一个命令的结果发给了另一个命令).

命令替换的模版

用圆括号扩起来的命令

>(command)

<(command)

启动进程替换. 它使用/dev/fd/<n>文件将圆括号中的进程处理结果发送给另一个进程. [1] (译

者注: 实际上现代的UNIX类操作系统提供的/dev/fd/n文件是与文件描述符相关的, 整数n指的就

是进程运行时对应数字的文件描述符)

在"<"或">"与圆括号之间是没有空格的. 如果加了空格, 会产生错误.

bash$ echo >(true)

/dev/fd/63

bash$ echo <(true)

/dev/fd/63

Bash在两个文件描述符之间创建了一个管道, --fIn和fOut--. true命令的stdin被连接到fOut

(dup2(fOut, 0)), 然后Bash把/dev/fd/fIn作为参数传给echo. 如果系统缺乏/dev/fd/<n>文件, Bash会

使用临时文件. (感谢, S.C.)

进程替换可以比较两个不同命令的输出, 甚至能够比较同一个命令不同选项情况下的输出.

bash$ comm <(ls -l) <(ls -al)

total 12

-rw-rw-r-- 1 bozo bozo 78 Mar 10 12:58 File0

-rw-rw-r-- 1 bozo bozo 42 Mar 10 12:58 File2

-rw-rw-r-- 1 bozo bozo 103 Mar 10 12:58 t2.sh

total 20

drwxrwxrwx 2 bozo bozo 4096 Mar 10 18:10 .

drwx------ 72 bozo bozo 4096 Mar 10 17:58 ..

-rw-rw-r-- 1 bozo bozo 78 Mar 10 12:58 File0

-rw-rw-r-- 1 bozo bozo 42 Mar 10 12:58 File2

-rw-rw-r-- 1 bozo bozo 103 Mar 10 12:58 t2.sh

使用进程替换来比较两个不同目录的内容(可以查看哪些文件名相同, 哪些文件名不同):

1 diff <(ls $first_directory) <(ls $second_directory)

一些进程替换的其他用法与技巧:

1 cat <(ls -l)

2 # 等价于 ls -l | cat

3

4 sort -k 9 <(ls -l /bin) <(ls -l /usr/bin) <(ls -l /usr/X11R6/bin)

5 # 列出系统3个主要'bin'目录中的所有文件, 并且按文件名进行排序.

6 # 注意是3个(查一下, 上面就3个圆括号)明显不同的命令输出传递给'sort'.

7

8

9 diff <(command1) <(command2) # 给出两个命令输出的不同之处.

10

11 tar cf >(bzip2 -c > file.tar.bz2) $directory_name

12 # 调用"tar cf /dev/fd/?? $directory_name", 和"bzip2 -c > file.tar.bz2".

13 #

14 # 因为/dev/fd/<n>的系统属性,

15 # 所以两个命令之间的管道不必被命名.

16 #

17 # 这种效果可以被模拟出来.

18 #

19 bzip2 -c < pipe > file.tar.bz2&

20 tar cf pipe $directory_name

21 rm pipe

22 # 或

23 exec 3>&1

24 tar cf /dev/fd/4 $directory_name 4>&1 >&3 3>&- | bzip2 -c > file.tar.bz2 3>&-

25 exec 3>&-

26

27

28 # 感谢, Stephane Chazelas

一个读者给我发了一个有趣的例子, 是关于进程替换的, 如下.

1 # 摘自SuSE发行版中的代码片断:

2

3 while read des what mask iface; do

4 # 这里省略了一些命令...

5 done < <(route -n)

6

7

8 # 为了测试它, 我们让它做点事.

9 while read des what mask iface; do

10 echo $des $what $mask $iface

11 done < <(route -n)

12

13 # 输出:

14 # Kernel IP routing table

15 # Destination Gateway Genmask Flags Metric Ref Use Iface

16 # 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo

17

18

19

20 # 就像Stephane Chazelas所给出的那样, 一个更容易理解的等价代码是:

21 route -n |

22 while read des what mask iface; do # 管道的输出被赋值给了变量.

23 echo $des $what $mask $iface

24 done # 这将产生出与上边相同的输出.

25 # 然而, Ulrich Gayer指出 . . .

26 #+ 这个简单的等价版本在while循环中使用了一个子shell,

27 #+ 因此当管道结束后, 变量就消失了.

28

29

30

31 # 更进一步, Filip Moritz解释了上面两个例子之间存在一个细微的不同之处,

32 #+ 如下所示.

33

34 (

35 route -n | while read x; do ((y++)); done

36 echo $y # $y 仍然没有被声明或设置

37

38 while read x; do ((y++)); done < <(route -n)

39 echo $y # $y 的值为route -n的输出行数.

40 )

41

42 # 一般来说, (译者注: 原书作者在这里并未加注释符号"#", 应该是笔误)

43 (

44 : | x=x

45 # 看上去是启动了一个子shell

46 : | ( x=x )

47 # 但

48 x=x < <(:)

49 # 其实不是

50 )

51

52 # 当你要解析csv或类似东西的时侯, 这非常有用.

53 # 事实上, 这就是SuSE的这个代码片断所要实现的功能.

注意事项

[1] 这与命名管道(临时文件)具有相同的作用, 并且, 事实上, 命名管道也被同时使用在进程

替换中.