Jython and IronPython run on platforms where strings are unicode capable by default. Both implementations have chosen to make str essentially an alias for unicode in Python source code. The bytes type, introduced in PEP358 as part of transition to fully unicode Python 3.0, is unambiguously a sequence of single byte values. We can see in the table below that Jython and IronPython are caught between what is on the one hand most practical for interopability with existing code and their host platforms, and on the other hand the Right Thing as delivered by Python 3.0.
|
Jython 2.5 |
IronPython 2.6 |
CPython 2.6 |
CPython 3.0 |
| str |
multibyte |
multibyte |
byte |
multibyte |
| unicode |
multibyte |
multibyte |
multibyte |
multibyte |
| bytes |
byte |
byte |
byte |
byte |
It seems clear that if you need to write code that is portable between the different Python implementations you should steer clear str and use bytes and unicode to unambigiously express your intent.
Of course, this is impossible since the Python Standard Library is littered with uses of str. For example, in IronPython pickle.dumps() returns str just like Python 2.6 but the str is actually has multibyte storage. IronPython hides this well, but the abstraction can leak, resulting in much confusion. Again Python 3.0 does what is right, and pickle.dumps() returns a bytes instance.
These difficulties are most likely to occur when interfacing with native Java or .NET APIs that expect byte arrays, for example when pickling to database blobs.
In Jython an str instance can be converted to a Java byte array as follows.
>>> import jarray
>>> a = jarray.array("This is string", 'b')
>>> a
array('b', [84, 104, 105, 115, 32, 105, 115, 32, 32, 115, 116, 114, 105, 110, 103])
The equivalent in IronPython, as provided by Michael Foord, being,
>>> from System import Array, Byte
>>> a = Array[Byte](tuple(Byte(ord(c)) for c in "This is a string"))
>>> a
Array[Byte]((<System.Byte object at 0x000000000000002B [84]>, <System.Byte object at 0x000000000000002C [104]>, <System.Byte object at 0x000000000000002D [105]>, <System.Byte object at 0x000000000000002E [115]>, <System.Byte object at 0x000000000000002F [32]>, <System.Byte object at 0x0000000000000030 [105]>, <System.Byte object at 0x0000000000000031 [115]>, <System.Byte object at 0x0000000000000032 [32]>, <System.Byte object at 0x0000000000000033 [97]>, <System.Byte object at 0x0000000000000034 [32]>, <System.Byte object at 0x0000000000000035 [115]>, <System.Byte object at 0x0000000000000036 [116]>, <System.Byte object at 0x0000000000000037 [114]>, <System.Byte object at 0x0000000000000038 [105]>, <System.Byte object at 0x0000000000000039 [110]>, <System.Byte object at 0x000000000000003A [103]>))
Going back we can use identical code in IronPython and Jython.
>>> s = ''.join(chr(c) for c in a)
>>> s
'This is a string'
Recent Comments