Bioinformatics

Posts

Showing posts from January 17, 2010

Loopless programming (for calculating methylation types)

January 18, 2010

DNA Methylation is used in plants as a means of epi-genetic regulation. Bi-sulfite sequencing is a method used to determine the methylation pattern of a given set of DNA. Methylation occurs at 'C'ytosines. In plants, there are 3 types of methylation , determined by the nucleotides that follow the 'C': CG: a 'G' follows the 'C' CHG: anything but a 'G' follows the 'C' and a 'G' follows that CHH: no 'G' in the 2 subsequent nucleotides. So, programmatically, it's a trivial matter to calculate the type of methylation that can occur at each cytosine in a given sequence. Though, once the edges and edge-cases and reverse-complement are handled, it becomes a few lines of code full of loops and if's. For python (and most scripting languages) that's slow and loopy and iffy. The rest of this code explains how I utilized numpy to make a fast methylation-type calculator without loops or ifs. Loopless Given a nump...