Home > computing, IronPython, Python, software > Specification of rich comparison protocol use by the Python standard library

Specification of rich comparison protocol use by the Python standard library

I’m in the process of writing some code which ideally would work unchanged on CPython 2.x, CPython 3.x IronPython 2.x and Jython 2.x. Given the nature of the code I’m writing, that seems eminently possible with very few workarounds.

One of the changes between Python 2 and Python 3 was the removal of the cmp() function and its corresponding __cmp__() special method. Python 3 requires the use of the rich comparison operators __lt__(), __gt__(), __ge__(), __le__(), __eq__() and __ne__(). This change is well publicised and is fine so far as it goes; it’s usually no problem to modify the code accordingly. However, the same document goes on to say,

Use __lt__() for sorting, __eq__() with __hash__(), and other rich comparisons as needed.

This is clear guidance, if not a specification, that sort routines will use __lt__() for sorting. We don’t have to look much further for a statement closer to a specification though. In the Python Sorting HOWTO we can find the following clear declaration:

The sort routines are guaranteed to use __lt__() when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method.

Unfortunately, this fact is not mentioned in the documentation for list.sort() or the sorted() built-in, so it’s difficult to find. Even more unfortunately for me, the documentation for the standard library heapq module does not specify at all what protocols it requires of the objects it will be ordering.

Now all of this became an issue for me today when I implemented an important optimisation which required that I implement an incremental (i.e. lazy) partial sort rather than a complete eager sort of a sequence. Partial sorts can be efficiently implemented using heaps – hence my encounter with the heapq module. My migration from sort() to heapq worked fine on CPython 2.x and CPython 3.x but when I ran my tests with IronPython I got failures for some sorting results.

Soon I determined that the problem was that, as per the documentation for sorting, I’d implemented only __lt__() and __eq__() rather than the full set of six rich comparison operators. This worked fine with my original code and with my heapq based sort in CPython because those sort implementation do indeed only call __lt__() and __eq__(). However, the IronPython implementation of heapq also calls __gt__() and so failed. Note that IronPython hasn’t done anything wrong here – it’s following the letter of the canonical CPython documentation, if not the spirit. I’m not sure it’s fair to consider this a bug in IronPython – rather it exposes a small but important weakness in the CPython documentation.

There are perhaps two morals to this story: The first is to always implement all or nothing when in comes to the rich comparison operators just like your mother taught you. The second is that we’d avoid these sort of difficulties if the Python documentation was a little more explicit about what protocols are expected of objects by its algorithms and containers.

The existence of multiple Python implementations has been extremely effective at defining what it means to be Python as opposed to CPython. Many inconsistencies such as the one noted above have been flushed out over the years. For this reason alone I’m hesitant to suggest that the folks porting the standard library from CPython to IronPython or Jython should actually read the CPython implementations to ensure compatibility. I think it’s much preferable that such things are reimplemented from the documentation and test suite with consequential improvements for both.

Categories: computing, IronPython, Python, software Tags:
  1. No comments yet.
  1. No trackbacks yet.