Posts

Showing posts from June 6, 2010

python tools for bioinformatics II: group-ing

This is the second post describing python tools and idioms I've found useful in bioinformatics. This post involves grouping items either using a disjoint set or python's itertools.groupby . I was introduced to both of these by the post-doc who sits behind me. Grouper Grouper is an implementation of a disjoint set, initially from this recipe by Michael Droettboom and included in matplotlib.cbook. The slightly modified implementation we use is here . Where the docstring gives a decent idea of useage: """ This class provides a lightweight way to group arbitrary objects together into disjoint sets when a full-blown graph data structure would be overkill. Objects can be joined using .join(), tested for connectedness using .joined(), and all disjoint sets can be retrieved using list(g) The objects being joined must be hashable. >>> g = Grouper() >>> g.join('a', 'b') >>> g.join('b