Original Translation
31
The :mod:`codecs` module contains functions to look up existing encodings and register new ones. Unless you want to implement a new encoding, you'll most often use the :func:`codecs.lookup(encoding)` function, which returns a 4-element tuple: ``(encode_func, decode_func, stream_reader, stream_writer)``.
32
*encode_func* is a function that takes a Unicode string, and returns a 2-tuple ``(string, length)``. *string* is an 8-bit string containing a portion (perhaps all) of the Unicode string converted into the given encoding, and *length* tells you how much of the Unicode string was converted.
33
*decode_func* is the opposite of *encode_func*, taking an 8-bit string and returning a 2-tuple ``(ustring, length)``, consisting of the resulting Unicode string *ustring* and the integer *length* telling how much of the 8-bit string was consumed.
34
*stream_reader* is a class that supports decoding input from a stream. *stream_reader(file_obj)* returns an object that supports the :meth:`read`, :meth:`readline`, and :meth:`readlines` methods. These methods will all translate from the given encoding and return Unicode strings.
35
*stream_writer*, similarly, is a class that supports encoding output to a stream. *stream_writer(file_obj)* returns an object that supports the :meth:`write` and :meth:`writelines` methods. These methods expect Unicode strings, translating them to the given encoding on output.
36
For example, the following code writes a Unicode string into a file, encoding it as UTF-8::
37
import codecs unistr = u'\u0660\u2000ab ...' (UTF8_encode, UTF8_decode, UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8') output = UTF8_streamwriter( open( '/tmp/output', 'wb') ) output.write( unistr ) output.close()
38
The following code would then read UTF-8 input from the file::
39
input = UTF8_streamreader( open( '/tmp/output', 'rb') ) print repr(input.read()) input.close()
40
Unicode-aware regular expressions are available through the :mod:`re` module, which has a new underlying implementation called SRE written by Fredrik Lundh of Secret Labs AB.