Specification of rich comparison protocol use by the Python standard library
I’m in the process of writing some code which ideally would work unchanged on CPython 2.x, CPython 3.x IronPython 2.x and Jython 2.x. Given the nature of the code I’m writing, that seems eminently possible with very few workarounds.
One of the changes between Python 2 and Python 3 was the removal of the
cmp() function and its corresponding
__cmp__() special method. Python 3 requires the use of the rich comparison operators
__ne__(). This change is well publicised and is fine so far as it goes; it’s usually no problem to modify the code accordingly. However, the same document goes on to say,
__hash__(), and other rich comparisons as needed.
This is clear guidance, if not a specification, that sort routines will use
__lt__() for sorting. We don’t have to look much further for a statement closer to a specification though. In the Python Sorting HOWTO we can find the following clear declaration:
The sort routines are guaranteed to use
__lt__()when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an
Unfortunately, this fact is not mentioned in the documentation for
list.sort() or the
sorted() built-in, so it’s difficult to find. Even more unfortunately for me, the documentation for the standard library
heapq module does not specify at all what protocols it requires of the objects it will be ordering.
Now all of this became an issue for me today when I implemented an important optimisation which required that I implement an incremental (i.e. lazy) partial sort rather than a complete eager sort of a sequence. Partial sorts can be efficiently implemented using heaps – hence my encounter with the
heapq module. My migration from
heapq worked fine on CPython 2.x and CPython 3.x but when I ran my tests with IronPython I got failures for some sorting results.
Soon I determined that the problem was that, as per the documentation for sorting, I’d implemented only
__eq__() rather than the full set of six rich comparison operators. This worked fine with my original code and with my
heapq based sort in CPython because those sort implementation do indeed only call
__eq__(). However, the IronPython implementation of
heapq also calls
__gt__() and so failed. Note that IronPython hasn’t done anything wrong here – it’s following the letter of the canonical CPython documentation, if not the spirit. I’m not sure it’s fair to consider this a bug in IronPython – rather it exposes a small but important weakness in the CPython documentation.
There are perhaps two morals to this story: The first is to always implement all or nothing when in comes to the rich comparison operators just like your mother taught you. The second is that we’d avoid these sort of difficulties if the Python documentation was a little more explicit about what protocols are expected of objects by its algorithms and containers.
The existence of multiple Python implementations has been extremely effective at defining what it means to be Python as opposed to CPython. Many inconsistencies such as the one noted above have been flushed out over the years. For this reason alone I’m hesitant to suggest that the folks porting the standard library from CPython to IronPython or Jython should actually read the CPython implementations to ensure compatibility. I think it’s much preferable that such things are reimplemented from the documentation and test suite with consequential improvements for both.