|
Original |
Translation |
|
27
|
``ord(u)``, where *u* is a 1-character regular or Unicode string, returns the number of the character as an integer.
|
|
|
28
|
``unicode(string [, encoding] [, errors] )`` creates a Unicode string from an 8-bit string. ``encoding`` is a string naming the encoding to use. The ``errors`` parameter specifies the treatment of characters that are invalid for the current encoding; passing ``'strict'`` as the value causes an exception to be raised on any encoding error, while ``'ignore'`` causes errors to be silently ignored and ``'replace'`` uses U+FFFD, the official replacement character, in case of any problems.
|
|
|
29
|
The :keyword:`exec` statement, and various built-ins such as ``eval()``, ``getattr()``, and ``setattr()`` will also accept Unicode strings as well as regular strings. (It's possible that the process of fixing this missed some built-ins; if you find a built-in function that accepts strings but doesn't accept Unicode strings at all, please report it as a bug.)
|
|
|
30
|
A new module, :mod:`unicodedata`, provides an interface to Unicode character properties. For example, ``unicodedata.category(u'A')`` returns the 2-character string 'Lu', the 'L' denoting it's a letter, and 'u' meaning that it's uppercase. ``unicodedata.bidirectional(u'\u0660')`` returns 'AN', meaning that U+0660 is an Arabic number.
|
|
|
31
|
|
32
|
*encode_func* is a function that takes a Unicode string, and returns a 2-tuple ``(string, length)``. *string* is an 8-bit string containing a portion (perhaps all) of the Unicode string converted into the given encoding, and *length* tells you how much of the Unicode string was converted.
|
|
|
33
|
*decode_func* is the opposite of *encode_func*, taking an 8-bit string and returning a 2-tuple ``(ustring, length)``, consisting of the resulting Unicode string *ustring* and the integer *length* telling how much of the 8-bit string was consumed.
|
|
|
34
|
*stream_reader* is a class that supports decoding input from a stream. *stream_reader(file_obj)* returns an object that supports the :meth:`read`, :meth:`readline`, and :meth:`readlines` methods. These methods will all translate from the given encoding and return Unicode strings.
|
|
|
35
|
*stream_writer*, similarly, is a class that supports encoding output to a stream. *stream_writer(file_obj)* returns an object that supports the :meth:`write` and :meth:`writelines` methods. These methods expect Unicode strings, translating them to the given encoding on output.
|
|
|
36
|
For example, the following code writes a Unicode string into a file, encoding it as UTF-8::
|
|