OK OK, I couldn't resist that title but it probably goes a bit far. Let me try for a little more nuance:
PyPy.js: Now faster than CPython, on a single carefully-tuned benchmark, after JIT warmup.
It has been the better part of a year since I first started hacking on PyPy.js, an experiment in bringing a fast and compliant python interpreter to the web. I've been pretty quiet during that time but have certainly been keeping busy. Some of the big changes since my previous update include:
- An asmjs-to-python converter, so that PyPy's comprehensive JIT testsuite can be run on the asmjs backend.
- Some new optimizations in the emscripten compiler, which greatly reduce compiled code size.
- A basic interactive console, so you can try PyPy.js straight from your browser.
- And even uncovering an apparent bug in an LLVM optimization pass.
I encourage you to try the comparison on your own machine – do the following in a native python shell and in the PyPy.js demo shell and see how they compare:
>>> from test import pystone >>> >>> # An initial run, which will warm up the JIT for this function. >>> pystone.main() Pystone(1.1) time for 50000 passes = 1.657 This machine benchmarks at 30175 pystones/second >>> >>> # Subsequent runs should be much faster under PyPy. >>> pystone.main() Pystone(1.1) time for 50000 passes = 0.386 This machine benchmarks at 129534 pystones/second >>> >>> # Although Chrome users may need to run it a few times to prime both the PyPy and v8 JIT. >>> pystone.main() Pystone(1.1) time for 50000 passes = 0.362 This machine benchmarks at 138122 pystones/second
If all goes well then you should see the in-browser version benchmarking at more pystones/second than the standard python shell. My machine produced the following results (larger numbers are better):
|Interpreter||pystones/sec (cold)||pystones/sec (warm)|
|pypy.js in firefox||29446||129870|
Here it is in graph form for easier comparison, showing the pystone rating for each of twelve successive invocations of "pystone.main()" on CPython 2.7.5 and on PyPy.js in Firefox 28:
Now, just to be clear: all the usual caveats about benchmarking and performance apply here. This isn't a particularly scientific comparison, and I am being extremely cheeky in disregarding the JIT warmup time. But as a milestone, it is still a very gratifying result.
Digging a little deeper, it's interesting to compare performance between Firefox and Chrome on this benchmark. Here is the same graph with Chrome (specifically, version 34) thrown into the mix:
Three interesting points stand out in this comparison:
- Chrome starts off running things much slower than Firefox, even after the initial run has warmed up the PyPy JIT.
- Chrome shows a marked performance increase, then an equally marked decrease, before settling into its steady-state behaviour.
- Chrome's steady-state performance on this test is significantly faster than Firefox.
Firefox treats asmjs code specially – when it encounters an asmjs module declaration, it ahead-of-time compiles the entire thing down to machine code before executing any of it. This allows Firefox to offer consistent and predictable performance, without having to wonder whether the regular JIT machinery will correctly detect, profile, and optimize the code. So in Firefox we see a single warmup phase as PyPy.js emits its specialized asmjs code, followed by fairly stable performance.
However, Chrome's approach also means that it can optimize based on the actual runtime behaviour of the code.
The PyPy.js JIT works by generating a secondary asmjs module at runtime to contain the generated code. This code needs to call functions from the main interpreter module to do things like write barriers, garbage collection and so-on, and the pystone benchmark happens to call these functions a lot. Since Firefox compiles each asmjs module ahead-of-time independently, it must treat these as generic function calls and route them through a general-purpose code path. By contrast, Chrome is able to optimize the two modules as a single unit and potentially do clever things like inline these calls (again, the v8 introspection tools could probably pinpoint the precise optimizations that it does here, but I haven't dug in that deep).
The result is a pretty convincing win for Chrome in this comparison.
It turns out that cross-asmjs-module function calls are outside of Firefox's current sweet-spot, and that costs it dearly in this benchmark.
On the other hand, there is no fundamental reason why Firefox can't optimize such calls. It has all the information it needs, and it's simply a matter of implementing the additional specialized code-paths, of adding a little bit of "JIT" back into the ahead-of-time compilation. Ultimately this is just a bug in Firefox's asmjs support – in fact I filed it as such and have submitted some preliminary fixes which bring performance on this benchmark back up to being competitive with Chrome:
Finally, to address the elephant in the room: a completely fair comparison would pit PyPy.js against a native PyPy interpreter, not just a native CPython. Can JITing to asmjs compete with JITing to native code? My machine produced the following, much more humbling results:
That's an order of magnitude difference, from around 300-thousand to around 2-million pystones/second.
For most code we expect a slowdown of between 2 to 3 times when going from native code to asmjs-in-firefox, so being 10 times slower here is a little disappointing. But I believe at least some of the difference can be made up by continuing improvements in function call overheads, such as Bug 982036 and Bug 962641. This will be an interesting metric to track as improvements continue in both the PyPy.js codebase itself and in browser support for asmjs.
So, that's the fun part. As a proof of concept this has been a very interesting, very entertaining project. But I don't want to pretend that it's "done", or that this is clearly the way forward for python on the web. There are many ways in which PyPy.js is still far from ideal. My ongoing hit-list includes: