Posts

Showing posts from July 11, 2010

GSNAP

Aligners Since starting the methylcoder project, I've been using the bowtie short read aligner . It's very fast, uses very little memory, aligns Illimina, SOLID, and colorspace reads, and has enough options to keep you busy (including my favorite: --try-hard). There's a new short-read aligner in my feed-reader each week. I wish, as a service, they'd tell me what they do that bowtie doesn't. There are 2 scenarios I know of that bowtie doesn't handle. As with many aligners, bowtie creates an index of the reference genome. It uses a suffix array compressed with the Burrows-Wheeler Transform (BWT). It uses 32 bit integers to store the offsets into that index so it's limited to under 4 gigabases of reference sequence. This is more than enough to hold the human genome and, so is not a problem for most users, but, for BS-treated alignments, I need to store a converted (Cytosine to Thymine) forward, and reverse copy of the genome which doubles the size of the r