payday loans
Home > .NET, computing, IronPython, Jython, Python, software > IronPython hammers CPython when not mutating class attributes

IronPython hammers CPython when not mutating class attributes

Earlier today I posted the second article in what is turning out to be a short series in the investigation into why the performance of IronPython is around 100× slower than CPython, when running the front-end of my OWL BASIC compiler.

The most informative comment was from Curt Hagenlocher who works on IronPython in the Visual Studio Managed Languages group at Microsoft.

Curt suggested,

Try storing the counter as a global variable instead of a class-level member of Node — I think you’ll notice a dramatic improvement.

The modified benchmark program looks like this:

counter = 0

class Node(object):
    
    def __init__(self, children):
        global counter
        counter += 1
        self._children = children
                
def make_tree(depth):
    if depth > 1:
        return Node ([make_tree(depth - 1), make_tree(depth - 1)])
    else:
        return Node([])
        
def main(argv=None):
    global counter
    if argv is None:
        argv = sys.argv
    depth = int(argv[1]) if len(argv) > 1 else 10
       
    root = make_tree(depth)
    print counter
    return 0
        
if __name__ == '__main__':
    import sys
    sys.exit(main())

A dramatic improvement!

Well, Curt wasn’t wrong. This made a phenomenal difference with IronPython completing in only 12% of the time taken by CPython – over 8× faster with a binary tree depth of 20.

Let’s look in detail at the results. All results are from a dual quad-core 1.86 GHz Xeon with 4 GB RAM, and as before each benchmark was run five times, and the shortest time of the five taken.

The three test environments are:

  1. Python 2.5.2 x86 32-bit
  2. Jython 2.5rc2 on Java Hotspot 1.6 32-bit
  3. IronPython 2.0 on .NET 2.0 x64
The relative performance of the three main Python implementations on a benchmark that uses a global counter, rather than mutating a class attribute.

Figure 1. The relative performance of the three main Python implementations on a benchmark that uses a global counter, rather than mutating a class attribute.

Here we can see how IronPython’s performance has been improved hugely by this simple change. Although startup time dominates for the smaller problem size, now both Jython and IronPython surpass CPython at around half-a-million nodes.

Removing start-up time, which may be irrelevant for long-running processes, gives us the following chart:

The performance of the three main Python implementations excluding start-up time.

Figure 2. The performance of the three main Python implementations excluding start-up time.

Again there is a lot of noise in the data below 1000 nodes, but it is clear that Jython scales better than IronPython, which in turn is scaling better than CPython.

Up until now I’ve been using a log-log scale in the charts because of the wide variation in performance between the different implementations, but now the performance gap is much closer, it’s difficult to get a sense of just how much faster IronPython is on the modified benchmark. Let’s throw in a log-linear plot to help us appreciate what’s going on:

Figure 3. A linear representation of the same data as in Figure 2, to highlight the performance multiple between IronPython and CPython in the larger tests.

Figure 3. A linear representation of the same data as in Figure 2, to highlight the performance multiple between IronPython and CPython in the larger tests.

It’s perhaps easier to see now that IronPython is doing in 14 seconds what takes CPython 114 seconds to achieve!

Finally, let’s plot those results as we did before, as multiples of CPython performance:

Figure 4. Execution time of Jython and IronPython as multiples of CPython performance.

Figure 4. Execution time of Jython and IronPython as multiples of CPython performance.

It is easy to see that in this chart, once we pass half-a-million tree nodes (a tree depth of 19) that both Jython and IronPython are significantly beating CPython.

Explanation

Curt Hagenlocher offers the explanation in a comment in the thread on Reddit.

In this particular case, IronPython is slow because of the update to Node.counter. Currently, any update to a class will increment the version number for the class, which will have the effect of invalidating any rules compiled for that class. Effectively, the same rules are getting compiled over and over again. Moving the counter to a global should result in performance on par with that of CPython.

which is absolutely correct, except that he’s underselling the relative gain. IronPython is not only on a par with CPython, it can outperform it by a factor of eight!

With this knowledge in hand, I can now approach optimization of my OWL BASIC compiler, which lies back at the start of this illuminating tale.

Conclusion

  • Avoiding mutation of Python class attributes can have significant benefits for IronPython performance.
  • Both IronPython and Jython scale better than CPython by this benchmark, and have superior performance for large trees of nodes.
Categories: .NET, computing, IronPython, Jython, Python, software Tags:
  1. May 22nd, 2009 at 23:39 | #1

    Nice! Dino’s now changed the way we generate code in this case, so you won’t have to change any of your compiler code if you’re willing to pull the IronPython source from CodePlex and build it yourself. He’s also written about the issue on his blog at http://blogs.msdn.com/dinoviehland/archive/2009/05/22/on-performance.aspx

  2. mycall
    May 23rd, 2009 at 01:51 | #2

    Nick Craig-Wood made a valid point on your previous post which you haven’t tested for yet.. try disabling the python garbage collector while making the tree.

  3. aguy
    May 23rd, 2009 at 04:13 | #3

    read this and the previous, v. interesting read. i like the conclusions. wonder where class updates are on the list of things to do.

  4. aguy
    May 23rd, 2009 at 04:14 | #4

    haha teaches me to read the other comments first. nice.

  5. Hans Roman
    May 23rd, 2009 at 05:01 | #5

    Nice! But I would like to see this test with Mono too.

  6. Chris
    May 24th, 2009 at 19:46 | #6

    The reason for CPythons sharp drop-off is a known degenerate case in the GC when you create large numbers of objects without freeing any. The GC is invoked when the total number of tracked objects goes up by more than N (depends on generation and settings), on the assumption that if you’re creating that many more objects than you are destroying, then you probably have some cycles that could use cleaning up.

    The end result in this sort of object creation pattern, though, is that the cyclic GC is being invoked on every object, thus causing an exponential slowdown in object creation time.

    This generally plays out pretty well in practice. It tends to bite people making benchmarks a lot though. CPython 2.7 has some changes to make this heuristic smarter. IronPython and Jython of course rely on the platforms traditional GC rather than CPythons cycle collector and don’t have this specific issue.

  1. No trackbacks yet.