Showing posts from January 17, 2010

Loopless programming (for calculating methylation types)

DNA Methylation is used in plants as a means of epi-genetic regulation. Bi-sulfite sequencing is a method used to determine the methylation pattern of a given set of DNA. Methylation occurs at 'C'ytosines. In plants, there are 3 types of methylation, determined by the nucleotides that follow the 'C':
CG: a 'G' follows the 'C'
CHG: anything but a 'G' follows the 'C' and a 'G' follows that
CHH: no 'G' in the 2 subsequent nucleotides.
So, programmatically, it's a trivial matter to calculate the type of methylation that can occur at each cytosine in a given sequence. Though, once the edges and edge-cases and reverse-complement are handled, it becomes a few lines of code full of loops and if's. For python (and most scripting languages) that's slow and loopy and iffy. The rest of this code explains how I utilized numpy to make a fast methylation-type calculator without loops or ifs.

Given a numpy array `seq`, it…