Posts

Showing posts from September 28, 2008

genedex.fasta with numpy.memmap

EDIT: added job posting to comments.
I've been working a bit on genedex, I'm still not happy with the way it stores features. Which is a huge pickle of dictionaries where every dictionary is a 'feature' that looks like: {'name':'At2g26540', 'start': 1234, 'stop': 3456, 'strand': 1, 'chr': 2}. So the only way to do a search is by location--and that is _very_ fast, thanks to rtree, but there's no way to search by name or any other attribute--and an entire organism is loaded into memory at once--that part actually works out ok, but it feels dirty. I quickly wrote an SQLAlchemy backed interface to a simple db schema do allow this sort of searching here: http://code.google.com/p/genedex/source/browse/trunk/genedex/models/sqla.py. That already supports Feature.upstream(), downstream(), etc. methods, but it will work nicely once python supports sqlite rtree without any extra work--for now, it just uses BTree indexes on the st…