fast python with shedskin

There's a new release of the shedskin compiler. It is able to generate fast shared libraries that can be run from CPython. It can also create binaries so I thought I'd see how it did on some code from this BMC bioinformatcs article compared to psyco and CPython.
I took this iterative, brute-force (Needleman-Wunsch?) alignment code and modified it slightly. That's pasted here. (Notice the first line! that's how it appears in the original code). The modifications allow shedskin to infer the function and variable types. Plus, there's a couple changes I made that improve the run-time for all cases. The max() function is also in the original, but unnecessary because of python's builtin max(), however, pysco does run much faster using their hand-coded max(). For the shedskin run, I removed that extra code and used shedskin's builtin 'cause it made me feel better.

The python code was run as
$ time python -c "import alignment; alignment.imain()"

To run with psyco. The only change from the pasted script is to add
'import psyco; psyco.full()' at the top.

To run with shedskin built executable (after removing the max()):
$ shedskin alignment.py ### generates Makefile
$ make
$ time ./alignment

To run with shedskin as a shared (.so) library
$ shedskin -e -n alignment.py # generates Makefile
$ make
$ time python -c "import alignment; print alignment.imain()"
# python finds and imports the shared library (.so) before the .py file.

Timing:
time reported is real.
Python 2.5.1: 19.030s
Psyco: 1.1336s
Shedskin shared: 0.921
Shedskin binary: 0.818

CPython is much slower at 19seconds compared to ~1 for pysco and shedskin. The output format for the shedskin binary is slightly different because it calls the stuff in __main__, but they all generate the same alignments. It's interesting to look at the alignment.ss.py that shedskin generates, as you can see all the inferred types. The alignment.cpp contains the generated cpp code, which is also quite readable. It's also nice to be able to get a binary executable without jumping through any extra hoops.

Shedskin was very easy to use and faster than psyco for this case, I just pulled from SVN and it worked out of the box. It now has support for sets and regular expressions, and seems quite active. I can see using shedskin for purely brute force stuff where numpy is no help and I might otherwise have to resort to cython. I kinda like cython, but it's nice just to be able to get fast code with little to no modifications.


EDIT:
jython 2.3a0 runs this unaltered in 42 seconds.

Comments

Anonymous said…
More comments, and more benchmarking in D:

http://leonardo-m.livejournal.com/57439.html

Popular posts from this blog

filtering paired end reads (high throughput sequencing)

python interval tree

needleman-wunsch global sequence alignment -- updates and optimizations to nwalign