Bioinformatics

Posts

Python Mapscript Tricks

February 23, 2008

I mentioned previously a site which uses google maps with a WMS. The problem with using points (or labels) in a tiled application such as google maps, or OpenLayers is that each tile can only draw its own contents. So if you draw a point with a radius of 10 pixels whose center is 2 px away from the edge of the tile, then 8px of the entire 20px will be chopped. By default, those lost 8 pixels will not be drawn in the adjacent tile because the center of the point does not fall in that tile. so that gives something that looks like this for 2 adjacent tiles: It's hard to even tell that those tiles belong together! Using some python mapscript and PIL , it's pretty simple to make those look like this( except I dont know how to tell blogger not to add the spacin...) : That's actually the same WMS request(s), just changing the call from a simple WMS CGI script that calls mapserver to a WSGI script that uses python mapscript and PIL. The script: 1. takes the current bounding bo...

Open Source GIS

February 21, 2008

Just saw this linked from Sean's post . The quotes in there are absurd. But reminded me there's an important point in favor of OSS that I haven't seen. You can still get help directly via the email list or IRC from F Warmerdam , it's primary author. Likewise for the developers of Mapserver and OpenLayers and PostGIS . I wonder if the lead developers of ESRI products spend their off-time perusing the forum's or mailing lists and answering questions? ( I don't know, they may. But I suspect not. ) There's something to be said for enjoying what you do. And I think that's very true in the case of those in the open-source community. Happy coders make better software. Any programmer that denies that will leave me flabbergasted. I just don't understand how I could be effective only clicking the menus that were provided to me if I chose a black-box solution. But, call me crazy, I like linux. Also, in these parts, if you lock your single-speed bike to a woo...

Flash-y Map-plication vs. 500 marker limit

February 12, 2008

I've been pushing capabilities of javascript mapping frameworks. There's a real limit on the number of markers a browser can display without getting bogged down. 500 is a good limit, and that's probably too high for internet explorer. You can put as many markers as you like via WMS tiles. You can even take map clicks, send them back as AJAX queries and open an info window. I recently helped my friend do this, and we have a pretty snappy map displaying 5,000+ clickable "markers" no problem. It's actually not markers, just tiles, but it works quite well: That's a snapshot of the google maps application, with an info window that appears when a marker is clicked. It sends a GetFeatureInfo request back to a mapserver WMS. Vector drawing in the browser is limited. The amazing efforts of the OpenLayers , featureserver projects make this difficult to assert, as they abstract away all of the browser incompatibilities and give a nice platform to do real vector edit...

python bioinformatics

February 10, 2008

There's a new article out in BMC Bioinformatics with a comparison of the speed and length of programs from various languages. This article was sent to the biology in python ( BIP ) mailing list. Looking at the code, it's not that bad, but it is clear that the authors are not pythonistas, and that the reviewers have done a great job on the actual paper, but likely there was no thorough review of the code. The authors define their own max() function that needlessly overrides python's built-in, and they use the code: line.rstrip('/n') that indicates there was not a thorough understanding of python, or a complete code review. Even a non-python programmer should have seen the intent was to strip a newline '\n', but the operation is not inplace, so the desired behavior could be achieved by: line = line.rstrip('\n') Syntactical mistakes aside, python was given a poor review on speed. Andrew Dalke, of wide-finder (and general python-bio) fame ran the alig...

parallel blasts using python's pp module

February 02, 2008

BLAST handles utilizes multiple cores for some scenarios using the -a flag. however, i often do full genome blasts -- blasting all chromosomes of one organism against all others. For the case of rice (O.s.) against itself, this is 124 jobs. there are simple tools in python to run a queued blast in which on my 8 core machine, each core will run 1 of those blasts, as a job finishes, the pp or parallel python module starts the next job, based on the number of cpus it has detected for your machine. the syntax for the script is: python pblast.py rice_rice_10kmers where rice_rice_10kmers is the section in a config/.ini file to get the parameters. My fasta directory looks like: $ ls /tmp/rice/fasta/ ricetenkmers_chr01.fasta ricetenkmers_chr04.fasta ricetenkmers_chr07.fasta ricetenkmers_chr10.fasta ricetenkmers.order ricetenkmers_chr02.fasta ricetenkmers_chr05.fasta ricetenkmers_chr08.fasta ricetenkmers_chr11.fasta ricetenkmers_chr03.fasta ricetenkmers_chr06.fasta ricetenkmers_chr...

tinycc

November 26, 2007

i've been _trying_ to learn C. tinycc, beside being tiny, it compiles very quickly, allowing you to do cool things like script in C #!/usr/bin/tcc -run #include int main(int argc, char *argv[]) { printf("Hello World %s, %s", argv[0], argv[1]); return 0; } and then run as ./file.c arg_1 which makes it easier for those c-fu to guess and check. it also allows such nice things as c in python which is like pyinline, but uses ctypes and doesn't need write access.

Sorting by proximity to a date in PostgreSQL

October 11, 2007

postgreSQL has great support for dates, => SELECT '2007-08-23'::date - '2006-09-14'::date as days; days ------ 343 given a date column and a date, to find the nearest date, you can "extract the epoch", here, i used ABS as i just want the nearest date, before or after. SELECT *, ABS(EXTRACT(EPOCH FROM(date - '2006-08-23'))::BIGINT) as date_order FROM record WHERE well_id = 1234 ORDER BY date_order limit 1 i suppose this could make a nice PL/PGSQL function...

k-means clustering in scipy

September 29, 2007

it's fairly simple to do clustering of points with similar z-values in scipy: import numpy import matplotlib matplotlib.use('Agg') from scipy.cluster.vq import * import pylab pylab.close() # generate some random xy points and # give them some striation so there will be "real" groups. xy = numpy.random.rand(30,2) xy[3:8,1] -= .9 xy[22:28,1] += .9 # make some z vlues z = numpy.sin(xy[:,1]-0.2*xy[:,1]) # whiten them z = whiten(z) # let scipy do its magic (k==3 groups) res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3) # convert groups to rbg 3-tuples. colors = ([([0,0,0],[1,0,0],[0,0,1])[i] for i in idx]) # show sizes and colors. each color belongs in diff cluster. pylab.scatter(xy[:,0],xy[:,1],s=20*z+9, c=colors) pylab.savefig('/var/www/tmp/clust.png')

using python mapscript to create a shapefile and dbf

July 17, 2007

i always have trouble remembering how to use mapscript. it's pretty simple, but the docs are hard to find and the test cases (though excellent!) have a lot of abstraction. heres some code that creates a shapefile and dbf (using another module). and does a quick projection at the start. import mapscript as M import random from dbfpy import dbf ######################################### # do some projection ######################################### p = 'POINT(466666 466000)' shape = M.shapeObj.fromWKT(p) projInObj = M.projectionObj("init=epsg:32619") projOutObj = M.projectionObj("init=epsg:4326") shape.project(projInObj, projOutObj) print shape.toWKT() ######################################### # create a shapefile from scractch ######################################### ms_dbf = dbf.Dbf("/tmp/t.dbf", new=True) ms_dbf.addField(('some_field', "C", 10)) ms_shapefile = M.shapefileObj('/tmp/t.shp', M.MS_SHAPEFILE_POLYGON) for ...

Note to self: using python logging module

June 27, 2007

import logging logging.basicConfig(level=logging.DEBUG ,format='%(asctime)s [[%(levelname)s]] %(message)s' ,datefmt='%d %b %y %H:%M' ,filename='/tmp/app.log' ,filemode='a') logging.debug('A debug message') logging.info('Some information') logging.warning('A shot across the bows')

Fix indentation in VIM

April 26, 2007

Often times, i get which has the indentation completely messed up, not just mixing tab/spaces, but really "whack" these commands seem to magically fix for at least 2 test cases: :set filetype=xml :filetype indent on :e gg=G

Using Python MiddleWare

April 25, 2007

just trying to figure this stuff out. it's pretty simple, but there's one level of abstraction through web.py. you can use middleware to add keys to the environ for example. http://groovie.org/files/WSGI_Presentation.pdf #!/usr/bin/python import web import random class hi(object): def GET(self,who='world'): web.header('Content-type','text/html') print "hello %s" % who class bye(object): def GET(self,who='world'): web.header('Content-type','text/plain') print "bye %s" % who for c in web.ctx.env: print c, web.ctx.env[c] class other(object): def GET(self): web.header('Content-type','text/plain') for c in web.ctx: print c, web.ctx[c] urls = ( '/bye/(.*)', 'bye' ,'/hi/(.*)' , 'hi' , '/.*' , 'other') class RandomWare(object): def __init__(self, app): ...

Install run, and benchmark mod_wsgi in < 10 minutes

April 24, 2007

svn checkout http://modwsgi.googlecode.com/svn/trunk/ modwsgi cd mod_wsgi ./configure make sudo make install # note where mod_wsgi.so went on your system echo "LoadModule wsgi_module /path/to/mod_wsgi.so" >> /path/to/apache2.conf mkdir /var/www/wsgitest/ cd /var/www/wsgitest/ vi .htaccess # [in .htaccess] Options +ExecCGI < Files hi.py > SetHandler wsgi-script </Files> # [ end .htaccess] vi hi.py # [in hi.py] #!/usr/bin/python import web class hi(object): def GET(self,who='world'): web.header('Content-type','text/html') print "hello %s" % who class bye(object): def GET(self,who='world'): web.header('Content-type','text/html') print "bye %s" % who urls = ( '/bye/?(.*)', 'bye' ,'/hi/?(.*)' , 'hi' ) application = web.wsgifunc(web.webpyfunc(urls, globals())) #[end hi.py ] you can then browse to http://localhost/wsgitest/hi.py/hi/there # see "hello t...

vim tricks

March 21, 2007

i've been trying to learn new stuff in vim, instead of doing same old. recently, i've been using :tabe to edit in tabs. lately, i've been trying the :sp to edit in splits. this set of tricks makes it even nicer: http://www.vim.org/tips/tip.php?tip_id=173 now i can type ctrl+j to move down or ctrl+k to move up a split and have that split maximized. both tabs and split make it simple to yank and paste between files. something for which i had been using the mouse.

postgresql and mysql: benchmark? how?

January 31, 2007

so somehow, my previous post on postgres / mysql made reddit , which i happened to be reading yesterday afternoon. i didnt even realize it was my post until following the link. there were a couple harsh comments stating that i found what i wanted to find. ... which were merited given the sensationalist way i presented the results (50%) and the careless use of the term "benchmark". and yes, the config for mySQL was the default. still, i just presented what i found. i was surprised noone commented on the hackish way that i checked to see if it was a protein sequence in perl, rather than mysql--or the coolness of pre-fetching in DBIx (which is available as eager loading or setting lazy=False in the mapper in python's sqlalchemy). re the comments on things to change in the postgresql.conf... i'll try at some point. are there any suggestions for mysql? the machine has 12G ram, 4CPUs. likely, the raid configuration (i dont know how it's set up) is not optimal, but tha...

real-world postgresql vs mysql benchmark

January 14, 2007

At my work, we have a large MySQL database (15 MyISAM tables, 21 million rows, 10Gigs size). After seeing the benchmarks showing that Postgres out-performs MySQL on multi-core machines (our new db server has 4 CPU's), I ported the database to PostgreSQL. We have begun using the DBIx perl module since Class::DBI is too sloooow. The DBIx module allows closer access to the generated SQL, and it allows "prefetch"ing which eliminates extra back-and-forth (and object creation) between the server and client. In addition, the connection string is in the script, not in the generated API. This makes it easy to benchmark as all that is required to change between db engines is to change the connection string. Using this script: use CogeX; use strict; # mysql #my $connstr = 'dbi:mysql:genomes:host:3306'; # postgresql my $connstr = 'dbi:Pg:dbname=genomes;host=host;port=5432'; my $s = CoGeX->connect($connstr, 'user', 'pass' ); my $rs = $s->result...

python jumble solver

November 26, 2006

over thanksgiving, i figured it'd be a good hack to solve the jumble . Originally, i tried to do all permutation of letter orders in the word but got confused and just did letter frequency: import sys word = len(sys.argv) > 1 and (sys.argv[1]).strip("\n").lower() or "egaugnal" words = [w.strip("\n").lower() for w in open('/usr/share/dict/web2') if len(w) == len(word)+1] def lfreq(w): wfreq = {} for letter in w: wfreq[letter] = letter in wfreq and wfreq[letter]+1 or 1 return wfreq wfreq = lfreq(word) match = [w for w in words if lfreq(w) == wfreq] print match that will print all matches for the word send in on the command line, assuming your dictionary file is in /usr/share/dict/web2 :-( . I tried for a while to do a recursive permute function which would take a word or list of letters and return all possible permutations: permute('abc') -> ['abc','acb','bca',bac',cba',cab'], which...

simple AJAX

November 18, 2006

this is the AJAX implementation i use: function jfetch(url,t,o) { var req = jfetch.xhr(); req.open("GET",url,true); req.onreadystatechange = function() { if(req.readyState == 4){ var rsp = req.responseText; if(t.constructor == Function) return t.apply(o,[rsp]); t = document.getElementById(t); t[t.value ==undefined ? 'innerHTML': 'value'] = rsp; req = null; } }; req.send(null); } jfetch.xhr = (window.ActiveXObject) ? function(){ return new ActiveXObject("Microsoft.XMLHTTP"); } : function(){ return new XMLHttpRequest()}; it's short, it only checks for the transport (ActiveX or XHR) once, and it takes either an element id or a call back function. and, i can understand it.