Posts

Showing posts from 2007

tinycc

i've been _trying_ to learn C. tinycc, beside being tiny, it compiles very quickly, allowing you to do cool things like script in C
#!/usr/bin/tcc -run

#include

int main(int argc, char *argv[]) {
printf("Hello World %s, %s", argv[0], argv[1]);
return 0;
}
and then run as
./file.c arg_1

which makes it easier for those c-fu to guess and check.
it also allows such nice things as c in python
which is like pyinline, but uses ctypes and doesn't need write access.

Sorting by proximity to a date in PostgreSQL

postgreSQL has great support for dates,

=> SELECT '2007-08-23'::date - '2006-09-14'::date as days;
days
------
343

given a date column and a date, to find the nearest date, you can "extract the epoch", here, i used ABS as i just want the nearest date, before or after.

SELECT *, ABS(EXTRACT(EPOCH FROM(date - '2006-08-23'))::BIGINT) as date_order FROM record WHERE well_id = 1234 ORDER BY date_order limit 1

i suppose this could make a nice PL/PGSQL function...

k-means clustering in scipy

it's fairly simple to do clustering of points with similar z-values in scipy:


import numpy
import matplotlib
matplotlib.use('Agg')
from scipy.cluster.vq import *
import pylab
pylab.close()

# generate some random xy points and
# give them some striation so there will be "real" groups.
xy = numpy.random.rand(30,2)
xy[3:8,1] -= .9
xy[22:28,1] += .9

# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])

# whiten them
z = whiten(z)

# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)

# convert groups to rbg 3-tuples.
colors = ([([0,0,0],[1,0,0],[0,0,1])[i] for i in idx])

# show sizes and colors. each color belongs in diff cluster.
pylab.scatter(xy[:,0],xy[:,1],s=20*z+9, c=colors)
pylab.savefig('/var/www/tmp/clust.png')


using python mapscript to create a shapefile and dbf

i always have trouble remembering how to use mapscript. it's pretty simple, but the docs are hard to find and the test cases (though excellent!) have a lot of abstraction.

heres some code that creates a shapefile and dbf (using another module). and does a quick projection at the start.


import mapscript as M
import random
from dbfpy import dbf

#########################################
# do some projection
#########################################

p = 'POINT(466666 466000)'
shape = M.shapeObj.fromWKT(p)
projInObj = M.projectionObj("init=epsg:32619")
projOutObj = M.projectionObj("init=epsg:4326")
shape.project(projInObj, projOutObj)
print shape.toWKT()


#########################################
# create a shapefile from scractch
#########################################
ms_dbf = dbf.Dbf("/tmp/t.dbf", new=True)
ms_dbf.addField(('some_field', "C", 10))

ms_shapefile = M.shapefileObj('/tmp/t.shp', M.MS_SHAPEFILE_POLYGON)

for i in xrange(10):

Note to self: using python logging module

import logging

logging.basicConfig(level=logging.DEBUG
,format='%(asctime)s [[%(levelname)s]] %(message)s'
,datefmt='%d %b %y %H:%M'
,filename='/tmp/app.log'
,filemode='a')

logging.debug('A debug message')
logging.info('Some information')
logging.warning('A shot across the bows')

Fix indentation in VIM

Often times, i get which has the indentation completely messed up, not just mixing tab/spaces, but really "whack"

these commands seem to magically fix for at least 2 test cases:

:set filetype=xml
:filetype indent on
:e
gg=G

Using Python MiddleWare

just trying to figure this stuff out. it's pretty simple, but there's one level of abstraction through web.py. you can use middleware to add keys to the environ for example.
http://groovie.org/files/WSGI_Presentation.pdf


#!/usr/bin/python

import web
import random

class hi(object):
def GET(self,who='world'):
web.header('Content-type','text/html')
print "hello %s" % who

class bye(object):
def GET(self,who='world'):
web.header('Content-type','text/plain')
print "bye %s" % who

for c in web.ctx.env:
print c, web.ctx.env[c]

class other(object):
def GET(self):
web.header('Content-type','text/plain')
for c in web.ctx:
print c, web.ctx[c]

urls = ( '/bye/(.*)', 'bye'
,'/hi/(.*)' , 'hi'
, '/.*' , 'other')

class RandomWare(object):
def __init__(self, app):
self.your_app = app;

Install run, and benchmark mod_wsgi in < 10 minutes

svn checkout http://modwsgi.googlecode.com/svn/trunk/ modwsgi
cd mod_wsgi
./configure
make
sudo make install
# note where mod_wsgi.so went on your system
echo "LoadModule wsgi_module /path/to/mod_wsgi.so" >> /path/to/apache2.conf

mkdir /var/www/wsgitest/
cd /var/www/wsgitest/

vi .htaccess
# [in .htaccess]
Options +ExecCGI
< Files hi.py >
SetHandler wsgi-script
</Files>
# [ end .htaccess]

vi hi.py
# [in hi.py]

#!/usr/bin/python

import web

class hi(object):
def GET(self,who='world'):
web.header('Content-type','text/html')
print "hello %s" % who

class bye(object):
def GET(self,who='world'):
web.header('Content-type','text/html')
print "bye %s" % who

urls = ( '/bye/?(.*)', 'bye'
,'/hi/?(.*)' , 'hi' )

application = web.wsgifunc(web.webpyfunc(urls, globals()))

#[end hi.py ]


you can then browse to
http://localhost/wsgitest/hi.py/hi/there
# see "hello there"
http://localhost/wsgite…

vim tricks

i've been trying to learn new stuff in vim, instead of doing same old. recently, i've been using :tabe to edit in tabs. lately, i've been trying the :sp to edit in splits.
this set of tricks makes it even nicer:
http://www.vim.org/tips/tip.php?tip_id=173
now i can type ctrl+j to move down or ctrl+k to move up a split and
have that split maximized.
both tabs and split make it simple to yank and paste between files. something for which i had been using the mouse.

postgresql and mysql: benchmark? how?

so somehow, my previous post on postgres / mysql made reddit , which i happened to be reading yesterday afternoon. i didnt even realize it was my post until following the link.
there were a couple harsh comments stating that i found what i wanted to find. ... which were merited given the sensationalist way i presented the results (50%) and the careless use of the term "benchmark". and yes, the config for mySQL was the default. still, i just presented what i found.
i was surprised noone commented on the hackish way that i checked to see if it was a protein sequence in perl, rather than mysql--or the coolness of pre-fetching in DBIx (which is available as eager loading or setting lazy=False in the mapper in python's sqlalchemy).

re the comments on things to change in the postgresql.conf... i'll try at some point. are there any suggestions for mysql?
the machine has 12G ram, 4CPUs. likely, the raid configuration (i dont know how it's set up) is not optimal, but that is…

real-world postgresql vs mysql benchmark

At my work, we have a large MySQL database (15 MyISAM tables, 21 million rows, 10Gigs size). After seeing the benchmarks showing that Postgres out-performs MySQL on multi-core machines (our new db server has 4 CPU's), I ported the database to PostgreSQL.
We have begun using the DBIx perl module since Class::DBI is too sloooow. The DBIx module allows closer access to the generated SQL, and it allows "prefetch"ing which eliminates extra back-and-forth (and object creation) between the server and client. In addition, the connection string is in the script, not in the generated API. This makes it easy to benchmark as all that is required to change between db engines is to change the connection string. Using this script:

use CogeX; use strict;
# mysql
#my $connstr = 'dbi:mysql:genomes:host:3306';
# postgresql
my $connstr = 'dbi:Pg:dbname=genomes;host=host;port=5432';
my $s = CoGeX->connect($connstr, 'user', 'pass' );

my $rs = $s->resultset('…