Home > .NET, computing, IronPython, Jython, Python, software > IronPython 2.0 and Jython 2.5 performance compared to Python 2.5

IronPython 2.0 and Jython 2.5 performance compared to Python 2.5

My previous post covering the performance problems I’ve been experiencing with IronPython raised some questions about whether the low performance was an effect peculiar to my system, or to my program — the OWL BASIC compiler — where the problem was first noticed. To briefly recap, I’d determined that IronPython was around 100× slower that CPython on the same program.

Since then, I’ve had time to reproduce the results with a small and completely unremarkable Python program, and also to run the tests on a different system. I had suspected that in the OWL BASIC compiler, my Python visitor implementation, which is used in applying transformations to the abstract syntax tree, was to blame. I set about condensing a tree visitor down to a small example, but I never got that far. It is sufficient to simply build a large binary tree to demonstrate the dramatic differences in the performance characteristics of the three main Python implementations.

The benchmark

Here is that test program, which just builds a simple binary tree of objects to the requested depth.

class Node(object):
    counter = 0
    def __init__(self, children):
        Node.counter += 1
        self._children = children
def make_tree(depth):
    if depth > 1:
        return Node ([make_tree(depth - 1), make_tree(depth - 1)])
        return Node([])
def main(argv=None):
    if argv is None:
        argv = sys.argv
    depth = int(argv[1]) if len(argv) > 1 else 10
    root = make_tree(depth)
    print Node.counter
    return 0
if __name__ == '__main__':
    import sys

The program builds a binary tree to the depth supplied as the only command line argument, or ten if one is not supplied. It counts the number of nodes as they a built. Remember that the merits or otherwise of this program are not the point! The point is the performance difference between the Python implementations when it is run.

My benchmarking approach has been to run this script five times for each tree depth from a depth of one, upwards to 22, or until my patience was exhausted. I’ve taken the minimum time from each run of five. Since there is a non-linear relationship between the depth of the tree and the number of nodes contained therein, logarithmic axes are used in all the charts that follow.

64 bit Windows Vista x64

Here are the results for the first test machine – with dual quad-core 1.86 GHz Xeons with 4 GB RAM running Vista x64, testing IronPython on .NET 2.0, Jython 2.5rc2 on Java Hotspot 1.6.0 and Python 2.5.2.

Create time for a binary tree including Python virtual machine startup on Windows Vista x64 with 1.86 GHz Xeon processors.

Figure 1. Creation time for a binary tree including Python virtual machine startup on Windows Vista x64 with 1.86 GHz Xeon processors.

In Figure 1 we see that above 1000 nodes or so (tree depth of 10) performance for IronPython begin to degrade rapidly. CPython holds out for another two orders of magnitude before the significant costs begin to be felt . Its interesting to see that although Jython is in the middle of the pack, it scales much better than CPython, surpassing it at around half-a-million nodes (tree depth of 19).

In my application — a compiler — virtual machine (VM) start-up time is important; however, in many long-running applications this is not the case, so it is interesting to subtract VM start-up time from each series, which we see in Figure 2, below.

By subtracting VM start-up time, we get a picture more interesting for long-running processes.

By subtracting VM start-up time, we get a picture more interesting for long-running processes.

Below 100 tree nodes, there is a lot of noise in these measurements. Above 100 nodes its easy to see that the blue IronPython curve is at least two chart divisions above the red CPython curve — that’s two orders of magnitude or 100× slower, and getting relatively worse as the size of the tree increases.

32 bit Windows XP x86

Responses to my earlier article suggested that trying IronPython 2.0.1 with Ngen’ed binaries on x86 may make a difference. Well, to cut a long story short, it doesn’t, but here are the details. These tests were run on a 900 MHz Pentium M Centrino laptop with 768 MB RAM, and so cannot be directly compared with those above, although its notable that a one year old workstation is only twice as fast as a five year old laptop. Moore’s law certainly isn’t delivering here!

The performance profiles are very similar with IronPython 2.0.1 on x86.

The performance profiles are very simular with IronPython 2.0.1 on x86.

On x86, IronPython is still 100× slower than CPython, and Jython still scales better. It seems the essence of this benchmark is not dependent on which hardware or CLR platform it is run.

I’ll close by re-presenting the data in the x86 benchmarks as multiples of CPython performance, because it dramatically demonstrates the different responses to the scale of the problem size for IronPython and Jython. Again we see Jython catching up with CPython at a tree depth of 19, just we saw on x64. and IronPython delivering 6000× worse than CPython at a tree depth depth of 15. A tree of this size with thirty-thousand nodes is very similar in scale to the AST tree sizes found in the OWL BASIC during compilation of large programs.

Performance of IronPython and Jython as multiples of CPython performance.

Performance of IronPython and Jython as multiples of CPython performance.


  • IronPython can be very slow, even on programs in the microbenchmark category, which are doing standard operations such as building trees. Presumably there are still significant optimizations to be made in IronPython to bring its performance closer to that of the other Python implementations. Hopefully, this example and the measurements can contribute to that improvement.
  • Jython may scale better than Python if your application exercises Python in similar ways to this benchmark. Speculatively, that could have implications for projects such as SCons, which build large in-memory graphs.
  • I suppose if nothing else we have demonstrated in passing that Java can be faster than C for some non-trivial programs (like a Python interpreter) running a trivial program, like this benchmark.
  1. May 22nd, 2009 at 14:41 | #1

    Thank you for this! It’s always fun to stack things up against each other, especially when there are visual aides to satisfy those who would rather not read (like me)!

  2. Vassil
    May 22nd, 2009 at 15:45 | #2

    Good post. However in the last conclusion you are comparing apples to oranges.

  3. fijal
    May 22nd, 2009 at 16:08 | #3

    hi, nice data. Couple of issues:

    * I think this benchmark heavily measure gc performance. It’s good, because
    gc performance is essential in a lot of applications, but it’s worth noting.

    * Jvm’s gc or .NET’s gc is known to be much better than CPython. IronPython guys
    are even claiming their gc is better than jvm’s one, so it’s surprising.

    * What about measuring PyPy? From my rough and fast (completely unscientific :-)
    measurments it seems that PyPy is faster than CPython starting from about 18-19


  4. Filox
    May 22nd, 2009 at 16:09 | #4

    I’m a bit off topic here, but can you tell me what tool you used to make these awesome graphs?

  5. May 22nd, 2009 at 16:12 | #5

    Try storing the counter as a global variable instead of a class-level member of Node — I think you’ll notice a dramatic improvement.

  6. May 22nd, 2009 at 16:23 | #6

    I never used IronPython but always have an eye on it. Your plots are very nice, what program did you use to generate them?

  7. Jonathan Allen
    May 22nd, 2009 at 16:26 | #7

    IronPython 2.6 was just released. Would you mind rerunning your tests with it?

  8. Robert Smallshire
    May 22nd, 2009 at 17:54 | #8

    Bloguero Connor :
    IronPython is great – apart from this issue. The plots were made with Apple’s Keynote, from iWork.

  9. Robert Smallshire
    May 22nd, 2009 at 17:58 | #9

    @Bloguero Connor
    Keynote for the plots.

  10. Robert Smallshire
    May 22nd, 2009 at 17:59 | #10

    @Curt Hagenlocher
    Sure, but changing the benchmark code is beside the point, which is that there is huge variation between the different Python implementation running the same code.

  11. May 22nd, 2009 at 18:03 | #11

    @Robert Smallshire
    It’s not beside the point if the takeaway as reported on Reddit and Twitter is that “IronPython is way slow”. I know that’s not the point you’re making, but that’s how people are reporting it.

  12. May 22nd, 2009 at 18:05 | #12

    Out of curiousity, are you running an x64 CPython on the 64-bit platform? It probably matters.

  13. Robert Smallshire
    May 22nd, 2009 at 18:12 | #13

    @Curt Hagenlocher
    I’m using 32-bit CPython on Vista x64.

  14. Robert Smallshire
    May 22nd, 2009 at 18:22 | #14

    Can you explain why you think I’m comparing apples to oranges in my final conclusion? Its a comparison of two implementations of Python, one written C and the other written in Java; both are doing the same job. Is it only reasonable to compare programs in different languages that are direct, and therefore likely non-idiomatic, transliterations?

  15. May 22nd, 2009 at 20:53 | #15

    If you disable the python garbage collector while making the tree you’ll find the cPython runs a lot quicker. In my experiments with a tree size of 20 it made it 12 times quicker!


    import gc

    at the start, then modify the code like this

    root = make_tree(depth)

    This is a known problem with the cPython garbage collector when creating lots of objects.

  16. wolf550e
    May 22nd, 2009 at 20:56 | #16

    On a Core 2 Duo E6550 (2.33GHz) running 64bit linux, compiled pypy interpreter with the hybrid gc is faster than cpython beginning with 18. At 17, cpython is faster. Times (best of 5 runs) are:

    17: user 0m0.718s
    18: user 0m2.086s

    17: user 0m0.882s
    18: user 0m1.757s

  17. Robert Smallshire
    May 22nd, 2009 at 21:11 | #17

    @Curt Hagenlocher
    I’ve modified the benchmark, as you suggested, to use a global rather than a class attribute, and the improvement is spectacular. I’ve posted a new article with the results.

  18. Dave Taylor
    May 22nd, 2009 at 23:35 | #18

    I also thought that the comparison between C and Java here wasn’t adequate for a conclusion of the capabilities of the languages. It is true that they are doing the same thing however they may be (most likely) are getting there in different ways so all we can say here is that an inneficient algorithm in C would run slower than an efficient algorithm in Java. If you wanted to compare Java and C you would have to use a test program translated as accurately as possible (obviously not syntax but algorithm’s) to determine the fastest.

    Its irrelevant anyway as Java (and Python for that matter) is written in C and so any program written in Java is merely a feflection of the execution speed of C bytecode. It would be more accurate to determine if a crap programer could make faster code in Java than C in less time – the raison d’être of all interpreted languages.

  19. ita
    May 23rd, 2009 at 01:46 | #19

    Scons does not support jython yet but waf does it.
    The benchmark does not look too good against cpython

  20. fijal
    May 23rd, 2009 at 03:01 | #20

    If you try with –gcrootfinder=asmgcc, it will be even faster (32bit linux only though, a bit experimental)

    @Nick craig-wood
    The pure fact that this issue is well known should be enough to fix it (it’s not
    even actually that hard). Doing random hacks and taking care by hand if you have
    or not circular references does not sound like a solution to me.


  21. May 23rd, 2009 at 12:21 | #21

    Use F# instead of Python. It is much faster and much better suited to compiler writing.

  22. Robert Smallshire
    May 23rd, 2009 at 12:39 | #22

    @Jon Harrop
    I’ve seriously considered switching to F# for this project. I’ve been using F# on other unrelated projects with a fair amount of success – writing small compilers for DSLs for image processing operations. I have to admit though, in my shallow investigations into using fslex and fsyacc, they didn’t seem to be as debuggable with complex grammars as PLY, which I’m using with Python.

  23. ottorommel
    May 27th, 2009 at 01:56 | #23

    @Robert Smallshire
    The ‘new article’ link gets a can’t preview drafts error.


  24. Robert Smallshire
    May 27th, 2009 at 15:54 | #24

    Thanks – it should be working now.

  25. Mee
    August 26th, 2015 at 15:33 | #25

    codekoala :
    Thank you for this! It’s always fun to stack things up against each other, especially when there are visual aides to satisfy those who would rather not read (like me)!


  1. May 22nd, 2009 at 21:03 | #1
  2. May 24th, 2009 at 02:03 | #2
  3. June 2nd, 2009 at 23:23 | #3
  4. June 3rd, 2009 at 21:24 | #4
  5. June 8th, 2009 at 22:47 | #5