Posts in "Python"

Aug. 17, 2010

PyEnchant: now with OSX!

The latest release of PyEnchant now contains an experimental binary distribution for OSX, as both an mpkg installer and a python egg. In theory, users on OSX 10.4 or later should be able to just drop pyenchant-1.6.3-py2.6-macosx-10.4-universal.egg somewhere on sys.path and be up and running and spellchecking with ease.

If you're a Mac user, please try it out and let me know if anything doesn't work the way you expect.

The experience of building this was quite interesting, and more than a little painful, because I wanted to build a proper universal library that could be used on almost any Mac out there. The gory details can be found in pyenchant-bdist-osx-sources-1.6.3.tar.gz; this post is a quick set of notes that might help others get started.

Fortunately for me, the familiar build toolchain of "./configure; make; make install" is pretty much intact on OSX. The only real trickery is getting the resulting library to work on systems other than your own. I hit two major stumbling blocks in this regard:

  • how to build fat binaries that still work on older versions of OSX?
  • how to make the libraries relocatable, so they can be installed at any location?

This may all be old news to seasoned OSX veterans, but hopefully these notes can help out other expat linux users like me.

Continue reading...

Aug. 9, 2010
[Python]

Compiling RPython Programs

Inspired by a recent discussion on Reddit about a Python-to-C++ compiler called Shed Skin, I decided to write up my own experiences on compiling (a restricted subset of) Python to a stand-alone executable. My tool of choice is the translation toolchain from the PyPy project – a project, by the way, that every Python programmer should take a look at.

Take this very exciting (EDIT: and needlessly inefficient) python script, which we'll assume is in a file "factors.py":

def factors(n): """Calculate all the factors of n.""" for i in xrange(2,n / 2): if n % i == 0: return [i] + factors(n / i) return [n] def main(argv): n = int(argv[1]) print "factors of", n, "are", factors(n) if __name__ == "__main__": import sys main(sys.argv)

We can of course run this from the command-line using the python interpreter, but gosh that's boring:

$> python factors.py 987654321 factors of 987654321 are [3, 3, 17, 17, 379721]

Instead, let's compile it into a stand-alone executable! Grab the latest source tarball from the PyPy downloads page and unzip it in your work directory:

Continue reading...

July 21, 2010

Starting Faster

I've just spent a few days trying to improve the performance of a frozen Python app - specifically, the time it takes to start up and present a login window. Most of the improvements were down to good old-fashioned writing of better code, but I also put together a couple of tricks to help shave off even more milliseconds. They both target one of the major sources of slowness when starting up a Python app: imports.

Import processing is an area where an app written in Python is at a big disadvantage compared to compiled languages such as C or Java. In a such languages the equivalent of an "import" statement is usually a compile-time directive that sucks in code from another file, and its impact on startup time is negligible. In Python, the import statement is a run-time directive that goes looking for the named module, compiles the source file if necessary, loads the compiled code into memory, executes the code in a new namespace, and finally returns the resulting module object. Clearly the fewer imports you can do at application startup, the better.

Lazy Imports

I first learned how important lazy imports can be from Andrew Bennetts, who works for Canonical on the Bazaar version control system. Most Python-related conferences in Australia feature Andrew giving a presentation on performance (most recently it was at PyCon AU with Making your python code fast) and he always mentions the lazy import mechanism used by Bazaar.

Continue reading...

Feb. 5, 2010

A GIL Adventure (with a happy ending)

I just halved the running time of one of my test suites.

The tests in question are multi-threaded, and while they perform a lot of IO they still push the CPU pretty hard. For some time now, nose has been reporting a happy little message along these lines:

Ran 35 tests in 24.893s

I wouldn't have though anything of it, but every so often this number would drop dramatically – often down to as little as 15 seconds. After a lot of puzzling, I realised that the tests would run faster whenever I had another test suite running at the same time. Making my computer work harder made these tests run almost twice as fast!

Could it be? Yes, I was finally seeing a manifestation of Python's dreaded Global Interpreter Lock - a.k.a. the "GIL of Doom". Because I'm running on a dual core system, the different threads in this test suite were spreading themselves over both processors and engaging in an epic GIL Battle that bogged down the whole process.

The typical response to this awful multi-core behaviour is "just use multiprocessing". That's not an option here, not least because these tests are supposed to be checking the thread safety of my code!

Continue reading...

Sept. 9, 2009
[Python]

Mimetypes and Threading don't mix

I've just spent weeks (yes, weeks) battling a bug that turns out to have been caused by everyone's favourite broken stdlib module, mimetypes. I'm far from the first to be bitten by this module's strangeness – Jacob Rus has compiled a long list of reasons why the mimetypes module is pathologically broken, while Armin Ronacher recently got a 1000% speedup just by changing the way he imported things from the module (yes, 1000%).

So consider this another little heads-up about the mimetypes module: it doesn't play nice with threads.. If two threads call mimetypes.guess_type at the same time, and the module happens to need to initialise its internal database, then one of the threads will go into an infinite recursive loop and blow your stack. What fun!

To be fair, the mimetypes module is slowly being converted into a healthy state, and this particular bug will be fixed in the next release. But in the meantime, if you need to do mimetype guesswork in Python, make sure you do it very carefully.

Comments