Bioinformatics

Posts

Showing posts from 2007

tinycc

November 26, 2007

i've been _trying_ to learn C. tinycc, beside being tiny, it compiles very quickly, allowing you to do cool things like script in C #!/usr/bin/tcc -run #include int main(int argc, char *argv[]) { printf("Hello World %s, %s", argv[0], argv[1]); return 0; } and then run as ./file.c arg_1 which makes it easier for those c-fu to guess and check. it also allows such nice things as c in python which is like pyinline, but uses ctypes and doesn't need write access.

Sorting by proximity to a date in PostgreSQL

October 11, 2007

postgreSQL has great support for dates, => SELECT '2007-08-23'::date - '2006-09-14'::date as days; days ------ 343 given a date column and a date, to find the nearest date, you can "extract the epoch", here, i used ABS as i just want the nearest date, before or after. SELECT *, ABS(EXTRACT(EPOCH FROM(date - '2006-08-23'))::BIGINT) as date_order FROM record WHERE well_id = 1234 ORDER BY date_order limit 1 i suppose this could make a nice PL/PGSQL function...

k-means clustering in scipy

September 29, 2007

it's fairly simple to do clustering of points with similar z-values in scipy: import numpy import matplotlib matplotlib.use('Agg') from scipy.cluster.vq import * import pylab pylab.close() # generate some random xy points and # give them some striation so there will be "real" groups. xy = numpy.random.rand(30,2) xy[3:8,1] -= .9 xy[22:28,1] += .9 # make some z vlues z = numpy.sin(xy[:,1]-0.2*xy[:,1]) # whiten them z = whiten(z) # let scipy do its magic (k==3 groups) res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3) # convert groups to rbg 3-tuples. colors = ([([0,0,0],[1,0,0],[0,0,1])[i] for i in idx]) # show sizes and colors. each color belongs in diff cluster. pylab.scatter(xy[:,0],xy[:,1],s=20*z+9, c=colors) pylab.savefig('/var/www/tmp/clust.png')

using python mapscript to create a shapefile and dbf

July 17, 2007

i always have trouble remembering how to use mapscript. it's pretty simple, but the docs are hard to find and the test cases (though excellent!) have a lot of abstraction. heres some code that creates a shapefile and dbf (using another module). and does a quick projection at the start. import mapscript as M import random from dbfpy import dbf ######################################### # do some projection ######################################### p = 'POINT(466666 466000)' shape = M.shapeObj.fromWKT(p) projInObj = M.projectionObj("init=epsg:32619") projOutObj = M.projectionObj("init=epsg:4326") shape.project(projInObj, projOutObj) print shape.toWKT() ######################################### # create a shapefile from scractch ######################################### ms_dbf = dbf.Dbf("/tmp/t.dbf", new=True) ms_dbf.addField(('some_field', "C", 10)) ms_shapefile = M.shapefileObj('/tmp/t.shp', M.MS_SHAPEFILE_POLYGON) for

Note to self: using python logging module

June 27, 2007

import logging logging.basicConfig(level=logging.DEBUG ,format='%(asctime)s [[%(levelname)s]] %(message)s' ,datefmt='%d %b %y %H:%M' ,filename='/tmp/app.log' ,filemode='a') logging.debug('A debug message') logging.info('Some information') logging.warning('A shot across the bows')

Fix indentation in VIM

April 26, 2007

Often times, i get which has the indentation completely messed up, not just mixing tab/spaces, but really "whack" these commands seem to magically fix for at least 2 test cases: :set filetype=xml :filetype indent on :e gg=G

Using Python MiddleWare

April 25, 2007

just trying to figure this stuff out. it's pretty simple, but there's one level of abstraction through web.py. you can use middleware to add keys to the environ for example. http://groovie.org/files/WSGI_Presentation.pdf #!/usr/bin/python import web import random class hi(object): def GET(self,who='world'): web.header('Content-type','text/html') print "hello %s" % who class bye(object): def GET(self,who='world'): web.header('Content-type','text/plain') print "bye %s" % who for c in web.ctx.env: print c, web.ctx.env[c] class other(object): def GET(self): web.header('Content-type','text/plain') for c in web.ctx: print c, web.ctx[c] urls = ( '/bye/(.*)', 'bye' ,'/hi/(.*)' , 'hi' , '/.*' , 'other') class RandomWare(object): def __init__(self, app):

Install run, and benchmark mod_wsgi in < 10 minutes

April 24, 2007

svn checkout http://modwsgi.googlecode.com/svn/trunk/ modwsgi cd mod_wsgi ./configure make sudo make install # note where mod_wsgi.so went on your system echo "LoadModule wsgi_module /path/to/mod_wsgi.so" >> /path/to/apache2.conf mkdir /var/www/wsgitest/ cd /var/www/wsgitest/ vi .htaccess # [in .htaccess] Options +ExecCGI < Files hi.py > SetHandler wsgi-script </Files> # [ end .htaccess] vi hi.py # [in hi.py] #!/usr/bin/python import web class hi(object): def GET(self,who='world'): web.header('Content-type','text/html') print "hello %s" % who class bye(object): def GET(self,who='world'): web.header('Content-type','text/html') print "bye %s" % who urls = ( '/bye/?(.*)', 'bye' ,'/hi/?(.*)' , 'hi' ) application = web.wsgifunc(web.webpyfunc(urls, globals())) #[end hi.py ] you can then browse to http://localhost/wsgitest/hi.py/hi/there # see "hello t

vim tricks

March 21, 2007

i've been trying to learn new stuff in vim, instead of doing same old. recently, i've been using :tabe to edit in tabs. lately, i've been trying the :sp to edit in splits. this set of tricks makes it even nicer: http://www.vim.org/tips/tip.php?tip_id=173 now i can type ctrl+j to move down or ctrl+k to move up a split and have that split maximized. both tabs and split make it simple to yank and paste between files. something for which i had been using the mouse.

postgresql and mysql: benchmark? how?

January 31, 2007

so somehow, my previous post on postgres / mysql made reddit , which i happened to be reading yesterday afternoon. i didnt even realize it was my post until following the link. there were a couple harsh comments stating that i found what i wanted to find. ... which were merited given the sensationalist way i presented the results (50%) and the careless use of the term "benchmark". and yes, the config for mySQL was the default. still, i just presented what i found. i was surprised noone commented on the hackish way that i checked to see if it was a protein sequence in perl, rather than mysql--or the coolness of pre-fetching in DBIx (which is available as eager loading or setting lazy=False in the mapper in python's sqlalchemy). re the comments on things to change in the postgresql.conf... i'll try at some point. are there any suggestions for mysql? the machine has 12G ram, 4CPUs. likely, the raid configuration (i dont know how it's set up) is not optimal, but tha

real-world postgresql vs mysql benchmark

January 14, 2007

At my work, we have a large MySQL database (15 MyISAM tables, 21 million rows, 10Gigs size). After seeing the benchmarks showing that Postgres out-performs MySQL on multi-core machines (our new db server has 4 CPU's), I ported the database to PostgreSQL. We have begun using the DBIx perl module since Class::DBI is too sloooow. The DBIx module allows closer access to the generated SQL, and it allows "prefetch"ing which eliminates extra back-and-forth (and object creation) between the server and client. In addition, the connection string is in the script, not in the generated API. This makes it easy to benchmark as all that is required to change between db engines is to change the connection string. Using this script: use CogeX; use strict; # mysql #my $connstr = 'dbi:mysql:genomes:host:3306'; # postgresql my $connstr = 'dbi:Pg:dbname=genomes;host=host;port=5432'; my $s = CoGeX->connect($connstr, 'user', 'pass' ); my $rs = $s->result