Subsections


7. 输入和输出 Input and Output

There are several ways to present the output of a program; data can be printed in a human-readable form, or written to a file for future use. This chapter will discuss some of the possibilities.

有几种方法可以表现程序的输出结果;数据可以用可读的结构打印,也可以写入文件供以后使用。本章将会讨论几种可行的做法。


7.1 设计输出格式 Fancier Output Formatting

So far we've encountered two ways of writing values: expression statements and the print statement. (A third way is using the write() method of file objects; the standard output file can be referenced as sys.stdout. See the Library Reference for more information on this.)

我们有两种大相径庭的输出值方法:表达式语句print 语句。(第三种访求是使用文件对象的 write() 方法,标准文件输出可以参考 sys.stdout。详细内容参见库参考手册。)

Often you'll want more control over the formatting of your output than simply printing space-separated values. There are two ways to format your output; the first way is to do all the string handling yourself; using string slicing and concatenation operations you can create any lay-out you can imagine. The standard module string contains some useful operations for padding strings to a given column width; these will be discussed shortly. The second way is to use the % operator with a string as the left argument. The % operator interprets the left argument much like a sprintf()-style format string to be applied to the right argument, and returns the string resulting from this formatting operation.

可能你经常想要对输出格式做一些比简单的打印空格分隔符更为复杂的控制。有两种方法可以格式化输出。第一种是由你来控制整个字符串,使用字符切片和联接操作就可以创建出任何你想要的输出形式。标准模块 string 包括了一些操作,将字符串填充入给定列时,这些操作很有用。随后我们会讨论这部分内容。第二种方法是使用 % 操作符,以某个字符串做为其左参数。 % 操作符将左参数解释为类似于 sprintf() 风格的格式字符串,并作用于右参数,从该操作中返回格式化的字符串。

One question remains, of course: how do you convert values to strings? Luckily, Python has ways to convert any value to a string: pass it to the repr() or str() functions. Reverse quotes (``) are equivalent to repr(), but their use is discouraged.

当然,还有一个问题,如何将(不同的)值转化为字符串?很幸运,Python总是把任意值传入 repr()str() 函数,转为字符串。相对而言引号 (``)等价于repr(),不过不提倡这样用。

The str() function is meant to return representations of values which are fairly human-readable, while repr() is meant to generate representations which can be read by the interpreter (or will force a SyntaxError if there is not equivalent syntax). For objects which don't have a particular representation for human consumption, str() will return the same value as repr(). Many values, such as numbers or structures like lists and dictionaries, have the same representation using either function. Strings and floating point numbers, in particular, have two distinct representations.

函数 str() 用于将值转化为适于人阅读的形式,而 repr() 转化为供解释器读取的形式(如果没有等价的语法,则会发生 SyntaxError 异常) 某对象没有适于人阅读的解释形式的话, str() 会返回与 repr() 等同的值。很多类型,诸如数值或链表、字典这样的结构,针对各函数都有着统一的解读方式。字符串和浮点数,有着独特的解读方式。

Some examples:

>>> s = 'Hello, world.'
>>> str(s)
'Hello, world.'
>>> repr(s)
"'Hello, world.'"
>>> str(0.1)
'0.1'
>>> repr(0.1)
'0.10000000000000001'
>>> x = 10 * 3.25
>>> y = 200 * 200
>>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'
>>> print s
The value of x is 32.5, and y is 40000...
>>> # The repr() of a string adds string quotes and backslashes:
... hello = 'hello, world\n'
>>> hellos = repr(hello)
>>> print hellos
'hello, world\n'
>>> # The argument to repr() may be any Python object:
... repr((x, y, ('spam', 'eggs')))
"(32.5, 40000, ('spam', 'eggs'))"
>>> # reverse quotes are convenient in interactive sessions:
... `x, y, ('spam', 'eggs')`
"(32.5, 40000, ('spam', 'eggs'))"

Here are two ways to write a table of squares and cubes:

以下两种方法可以输出平方和立方表:

>>> for x in range(1, 11):
...     print repr(x).rjust(2), repr(x*x).rjust(3),
...     # Note trailing comma on previous line
...     print repr(x*x*x).rjust(4)
...
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512
 9  81  729
10 100 1000
>>> for x in range(1,11):
...     print '%2d %3d %4d' % (x, x*x, x*x*x)
...
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512
 9  81  729
10 100 1000

(Note that one space between each column was added by the way print works: it always adds spaces between its arguments.)

(需要注意的是使用 print 方法时每两列之间有一个空格:它总是在参数之间加一个空格。)

This example demonstrates the rjust() method of string objects, which right-justifies a string in a field of a given width by padding it with spaces on the left. There are similar methods ljust() and center(). These methods do not write anything, they just return a new string. If the input string is too long, they don't truncate it, but return it unchanged; this will mess up your column lay-out but that's usually better than the alternative, which would be lying about a value. (If you really want truncation you can always add a slice operation, as in "x.ljust( n)[:n]".)

以上是一个 rjust() 函数的演示,这个函数把字符串输出到一列,并通过向左侧填充空格来使其右对齐。类似的函数还有 ljust()center()。这些函数只是输出新的字符串,并不改变什么。如果输出的字符串太长,它们也不会截断它,而是原样输出,这会使你的输出格式变得混乱,不过总强过另一种选择(截断字符串),因为那样会产生错误的输出值。(如果你确实需要截断它,可以使用切片操作,例如:" "x.ljust( n)[:n]"。)

There is another method, zfill(), which pads a numeric string on the left with zeros. It understands about plus and minus signs:

还有一个函数, zfill() 它用于向数值的字符串表达左侧填充0。该函数可以正确理解正负号:

>>> '12'.zfill(5)
'00012'
>>> '-3.14'.zfill(7)
'-003.14'
>>> '3.14159265359'.zfill(5)
'3.14159265359'

Using the % operator looks like this:

可以如下这样使用 % 操作符:

>>> import math
>>> print 'The value of PI is approximately %5.3f.' % math.pi
The value of PI is approximately 3.142.

If there is more than one format in the string, you need to pass a tuple as right operand, as in this example:

如果有超过一个的字符串要格式化为一体,就需要将它们传入一个元组做为右值,如下所示:

>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}
>>> for name, phone in table.items():
...     print '%-10s ==> %10d' % (name, phone)
...
Jack       ==>       4098
Dcab       ==>       7678
Sjoerd     ==>       4127

Most formats work exactly as in C and require that you pass the proper type; however, if you don't you get an exception, not a core dump. The %s format is more relaxed: if the corresponding argument is not a string object, it is converted to string using the str() built-in function. Using * to pass the width or precision in as a separate (integer) argument is supported. The C formats %n and %p are not supported.

大多数类 C 的格式化操作都需要你传入适当的类型,不过如果你没有定义异常,也不会有什么从内核中主动的弹出来。(however, if you don't you get an exception, not a core dump)使用 %s 格式会更轻松些:如果对应的参数不是字符串,它会通过内置的 str() 函数转化为字符串。Python支持用 * 作为一个隔离(整型的)参数来传递宽度或精度。Python 不支持 C的 %n%p 操作符。

If you have a really long format string that you don't want to split up, it would be nice if you could reference the variables to be formatted by name instead of by position. This can be done by using form %(name)format, as shown here:

如果可以逐点引用要格式化的变量名,就可以产生符合真实长度的格式化字符串,不会产生间隔。这一效果可以通过使用form %(name)format 结构来实现:

>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
>>> print 'Jack: %(Jack)d; Sjoerd: %(Sjoerd)d; Dcab: %(Dcab)d' % table
Jack: 4098; Sjoerd: 4127; Dcab: 8637678

This is particularly useful in combination with the new built-in vars() function, which returns a dictionary containing all local variables.

这个技巧在与新的内置函数 vars() 组合使用时非常有用,该函数返回一个包含所有局部变量的字典。


7.2 读写文件 Reading and Writing Files

open() returns a file object, and is most commonly used with two arguments: "open(filename, mode)".

open() 返回一个文件,通常的用法需要两个参数: "open(filename, mode)"。

>>> f=open('/tmp/workfile', 'w')
>>> print f
<open file '/tmp/workfile', mode 'w' at 80a0960>

The first argument is a string containing the filename. The second argument is another string containing a few characters describing the way in which the file will be used. mode can be 'r' when the file will only be read, 'w' for only writing (an existing file with the same name will be erased), and 'a' opens the file for appending; any data written to the file is automatically added to the end. 'r+' opens the file for both reading and writing. The mode argument is optional; 'r' will be assumed if it's omitted.

第一个参数是一个标识文件名的字符串。第二个参数是由有限的字母组成的字符串,描述了文件将会被如何使用。可选的模式 有: 'r' ,此选项使文件只读; 'w',此选项使文件只写(对于同名文件,该操作使原有文件被覆盖); 'a' ,此选项以追加方式打开文件; 'r+' ,此选项以读写方式打开文件;如果没有指定,默认为 'r' 模式。

On Windows and the Macintosh, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it'll corrupt binary data like that in JPEGs or .EXE files. Be very careful to use binary mode when reading and writing such files. (Note that the precise semantics of text mode on the Macintosh depends on the underlying C library being used.)

在Windows 和 Macintosh平台上, 'b'模式以二进制方式打开文件,所以可能会有类似于 'rb''wb''r+b' 等等模式组合。Windows平台上文本文件与二进制文件是有区别的,读写文本文件时,行尾会自动添加行结束符。这种后台操作方式对ASCII 文本文件没有什么问题,但是操作 JPEG 或 .EXE这样的二进制文件时就会产生破坏。在操作这些文件时一定要记得以二进制模式打开。(需要注意的是Mactiontosh 平台上的文本模式依赖于其使用的底层C库)。


7.2.1 文件对象(file object)的方法 Methods of File Objects

The rest of the examples in this section will assume that a file object called f has already been created.

本节中的示例都默认文件对象 f 已经创建。

To read a file's contents, call f.read(size), which reads some quantity of data and returns it as a string. size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it's your problem if the file is twice as large as your machine's memory. Otherwise, at most size bytes are read and returned. If the end of the file has been reached, f.read() will return an empty string ("").

要读取文件内容,需要调用 f.read(size),该方法读取若干数量的数据并以字符串形式返回其内容,字符串长度为数值size 所指定的大小。如果没有指定 size或者指定为负数,就会读取并返回整个文件。当文件大小为当前机器内存两倍时,就会产生问题。正常情况下,会尽可能按比较大的size 读取和返回数据。如果到了文件末尾,f.read()会返回一个空字符串("")。

>>> f.read()
'This is the entire file.\n'
>>> f.read()
''

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn't end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

f.readline()从文件中读取单独一行,字符串结尾会自动加上一个换行符,只有当文件最后一行没有以换行符结尾时,这一操作才会被忽略。这样返回值就不会有什么混淆不清,如果如果 f.readline()返回一个空字符串,那就表示到达了文件末尾,如果是一个空行,就会描述为'\n´ ,一个只包含换行符的字符串。

>>> f.readline()
'This is the first line of the file.\n'
>>> f.readline()
'Second line of the file\n'
>>> f.readline()
''

f.readlines() returns a list containing all the lines of data in the file. If given an optional parameter sizehint, it reads that many bytes from the file and enough more to complete a line, and returns the lines from that. This is often used to allow efficient reading of a large file by lines, but without having to load the entire file in memory. Only complete lines will be returned.

f.readlines()返回一个列表,其中包含了文件中所有的数据行。如果给定了sizehint参数,就会读入多于一行的比特数,从中返回多行文本。这个功能通常用于高效读取大型行文件,避免了将整个文件读入内存。这种操作只返回完整的行。

>>> f.readlines()
['This is the first line of the file.\n', 'Second line of the file\n']

f.write(string) writes the contents of string to the file, returning None.

f.write(string)string 的内容写入文件,返回 None

>>> f.write('This is a test\n')

To write something other than a string, it needs to be converted to a string first:

如果需要写入字符串以外的数据,就要先把这些数据转换为字符串。

>>> value = ('the answer', 42)
>>> s = str(value)
>>> f.write(s)

f.tell() returns an integer giving the file object's current position in the file, measured in bytes from the beginning of the file. To change the file object's position, use "f.seek(offset, from_what)". The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.

f.tell()返回一个整数,代表文件对象在文件中的指针位置,该数值计量了自文件开头到指针处的比特数。需要改变文件对象指针话话,使用"f.seek(offset,from_what)" 。指针在该操作中从指定的引用位置移动offset 比特,引用位置由 from_what 参数指定。 from_what值为0表示自文件起初处开始,1表示自当前文件指针位置开始,2表示自文件末尾开始。 from_what 可以忽略,其默认值为零,此时从文件头开始。

>>> f = open('/tmp/workfile', 'r+')
>>> f.write('0123456789abcdef')
>>> f.seek(5)     # Go to the 6th byte in the file
>>> f.read(1)
'5'
>>> f.seek(-3, 2) # Go to the 3rd byte before the end
>>> f.read(1)
'd'

When you're done with a file, call f.close() to close it and free up any system resources taken up by the open file. After calling f.close(), attempts to use the file object will automatically fail.

文件使用完后,调用 f.close()可以关闭文件,释放打开文件后占用的系统资源。调用 f.close()之后,再调用文件对象会自动引发错误。

>>> f.close()
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: I/O operation on closed file

File objects have some additional methods, such as isatty() and truncate() which are less frequently used; consult the Library Reference for a complete guide to file objects.

文件对象还有一些不太常用的附加方法,比如 isatty()truncate() 在库参考手册中有文件对象的完整指南。


7.2.2 pickle 模块 pickle Module

Strings can easily be written to and read from a file. Numbers take a bit more effort, since the read() method only returns strings, which will have to be passed to a function like int(), which takes a string like '123' and returns its numeric value 123. However, when you want to save more complex data types like lists, dictionaries, or class instances, things get a lot more complicated.

我们可以很容易的读写文件中的字符串。数值就要多费点儿周折,因为read() 方法只会返回字符串,应该将其传入 int()方法中,就可以将 '123'这样的字符转为对应的数值123。不过,当你需要保存更为复杂的数据类型,例如链表、字典,类的实例,事情就会变得更复杂了。

Rather than have users be constantly writing and debugging code to save complicated data types, Python provides a standard module called pickle. This is an amazing module that can take almost any Python object (even some forms of Python code!), and convert it to a string representation; this process is called pickling. Reconstructing the object from the string representation is called unpickling. Between pickling and unpickling, the string representing the object may have been stored in a file or data, or sent over a network connection to some distant machine.

好在用户不必要非得自己编写和调试保存复杂数据类型的代码。 Python提供了一个名为 pickle的标准模块。这是一个令人赞叹的模块,几乎可以把任何 Python对象 (甚至是一些 Python 代码段!)表达为为字符串,这一过程称之为封装pickling)。从字符串表达出重新构造对象称之为拆封unpickling)。封装状态中的对象可以存储在文件或对象中,也可以通过网络在远程的机器之间传输。

If you have an object x, and a file object f that's been opened for writing, the simplest way to pickle the object takes only one line of code:

如果你有一个对象 x ,一个以写模式打开的文件对象 f,封装对像的最简单的方法只需要一行代码:

pickle.dump(x, f)

To unpickle the object again, if f is a file object which has been opened for reading:

如果 f是一个以读模式打开的文件对象,就可以重装拆封这个对象:

x = pickle.load(f)

(There are other variants of this, used when pickling many objects or when you don't want to write the pickled data to a file; consult the complete documentation for pickle in the Python Library Reference.)

(如果不想把封装的数据写入文件,这里还有一些其它的变化可用。完整的pickle 文档请见Python 库参考手册)。

pickle is the standard way to make Python objects which can be stored and reused by other programs or by a future invocation of the same program; the technical term for this is a persistent object. Because pickle is so widely used, many authors who write Python extensions take care to ensure that new data types such as matrices can be properly pickled and unpickled.

pickle 是存储 Python 对象以供其它程序或其本身以后调用的标准方法。提供这一组技术的是一个持久化对象( persistent object )。因为 pickle 的用途很广泛,很多 Python 扩展的作者都非常注意类似矩阵这样的新数据类型是否适合封装和拆封