Posts Tagged ‘Python’

Inadvertent psychometric testing with Python

At some point – such as during the course of a job interview –  you’ve may have taken a numerical reasoning test with questions such as this:

Q. What is the next number in the sequence 81, 87, 84, 90, 87, ?

See the end of article for the answer.

Now, another one, this time in your Python interpreter.

>>> s = 'python'
>>> s[:-5]
>>> s[:-4]
>>> s[:-3]
>>> s[:-2]
>>> s[:-1]

and the next is the series is…

>>> s[:-0]


The surprise happens because of course the zeroth character is at the beginning of the string, rather than one beyond the end. It took me quite a while today to locate a bug caused by this negative indexing combined with slicing effect, subtly concealed as s[:-x] because intuitively we think that the identity:

s[:len(s) - x] == s[:-x]

is true for all x, when in fact it’s only true for 0 < x <= len(s). If Python was always based on one's complement binary representations I'd suggest that we use negative zero to represent the virtual character one beyond the end of the string and positive zero to represent the character at the beginning. I suspect however that my PEP would be short-lived. And rightfully so!

˙Ɛ6 sı ɹǝʍsuɐ ǝɥ⊥

Categories: Python Tags: ,

asq 1.0 released

June 6th, 2011 2 comments

I’ve just released asq 1.0 – a LINQ-inspired API for performing queries over Python objects. The project had been on the unpublished one-day-I’ll-finish-this back burner for a couple of years now, but recently I found myself wanting it in the course of developing another project. I decided to go public with the incomplete version I had back in January 2011, calling what I had asq 0.5 and hoping that the harsh glare of publicity would force me to make progress. I got 18 downloads without any promotional effort and that spurred me on towards a 0.9 release two months later which was feature complete and pretty stable.

Over the course of the next three months, asq 0.9 clocked up over 200 downloads.

I decided at this point that I’d hold back on any actual marketing as such until I could produce a solid 1.0 including comprehensive documentation. Well, asq 1.0 isn’t even 24 hours old yet and it’s already made 174 downloads.

Categories: asq, computing, Python Tags: , ,

String compatibility between Python implementations

June 18th, 2009 3 comments

Jython and IronPython run on platforms where strings are unicode capable by default. Both implementations have chosen to make str essentially an alias for unicode in Python source code. The bytes type, introduced in PEP358 as part of transition to fully unicode Python 3.0, is unambiguously a sequence of single byte values. We can see in the table below that Jython and IronPython are caught between what is on the one hand most practical for interopability with existing code and their host platforms, and on the other hand the Right Thing as delivered by Python 3.0.

Jython 2.5 IronPython 2.6 CPython 2.6 CPython 3.0
str multibyte multibyte byte multibyte
unicode multibyte multibyte multibyte multibyte
bytes byte byte byte byte

It seems clear that if you need to write code that is portable between the different Python implementations you should steer clear str and use bytes and unicode to unambigiously express your intent.

Of course, this is impossible since the Python Standard Library is littered with uses of str. For example, in IronPython pickle.dumps() returns str just like Python 2.6 but the str is actually has multibyte storage. IronPython hides this well, but the abstraction can leak, resulting in much confusion. Again Python 3.0 does what is right, and pickle.dumps() returns a bytes instance.

These difficulties are most likely to occur when interfacing with native Java or .NET APIs that expect byte arrays, for example when pickling to database blobs.

In Jython an str instance can be converted to a Java byte array as follows.

>>> import jarray
>>> a = jarray.array("This is  string", 'b')
>>> a
array('b', [84, 104, 105, 115, 32, 105, 115, 32, 32, 115, 116, 114, 105, 110, 103])

The equivalent in IronPython, as provided by Michael Foord, being,

>>> from System import Array, Byte
>>> a = Array[Byte](tuple(Byte(ord(c)) for c in "This is a string"))
>>> a
Array[Byte]((<System.Byte object at 0x000000000000002B [84]>, <System.Byte object at 0x000000000000002C [104]>, <System.Byte object at 0x000000000000002D [105]>, <System.Byte object at 0x000000000000002E [115]>, <System.Byte object at 0x000000000000002F [32]>, <System.Byte object at 0x0000000000000030 [105]>, <System.Byte object at 0x0000000000000031 [115]>, <System.Byte object at 0x0000000000000032 [32]>, <System.Byte object at 0x0000000000000033 [97]>, <System.Byte object at 0x0000000000000034 [32]>, <System.Byte object at 0x0000000000000035 [115]>, <System.Byte object at 0x0000000000000036 [116]>, <System.Byte object at 0x0000000000000037 [114]>, <System.Byte object at 0x0000000000000038 [105]>, <System.Byte object at 0x0000000000000039 [110]>, <System.Byte object at 0x000000000000003A [103]>))

Going back we can use identical code in IronPython and Jython.

>>> s = ''.join(chr(c) for c in a)
>>> s
'This is a string'