Wednesday, April 30, 2008

flash (n)back

The PNAS article linked here found subjects could improve IQ with training. I've written a simple flash version of their protocol over a couple long evenings in haxe/flash.

The article methods list 2 stimuli, a moving box, and spoken letters. The test subject is to respond (click in my case) when the box position or the spoken letter is the same as it was 2 time steps ago. Where 2 is increased as the subject gets better. I didn't do sound, I just show a big letter. Clearly, the logical thing to do is use it for a couple weeks and then implement the sound when I'm smarter. (I've never really used swfmill, but I think that'd be useful here...)

The article is ambiguous about when the letter is to sound, I've made both the letter and the box appear at the same time. The default, as in the article is to have 3 seconds between events, and to show the box for 0.5 seconds. I also add some indication of whether the answer was correct (green +) or not (red -). That actually makes things more difficult as its distracting to see something new flash on the screen. It also keeps a running total of misses (didn't click when should have), correct, and incorrect (clicked when shouldn't have). The grid size is set at 3 * 3 as that's more than difficult enough for me, and it appears to be what they used in the article. It actually only has 8 positions as I use the center to display the letter. Another ambiguity from the article is the number of letters. I use 3. That's easily changeable in the code.

The length of the time step (time_step, 3000ms default), the amount of time to show the box and text (show_time, 500ms default) and the number of steps back (nback, 2 default) are settable via the url so the default equates to:
?time_step=3000&nback=2&show_time=500

Observations

1. It's freakin hard. I'm suck at it.
2. Haxe is nice, but jeez, I write ugly actionscript.
3. I like quick, pointless evening projects like this where I have a clear idea of the outcome. It's good for learning.
4. Whatever points there are against flash, it's easy to uh, "deploy".


Here's a live version of the app (untested in IE, but swfobject should do it's thing). Just click in the flash movie when there's something that's the same as 2 time-steps ago.
You can make it arbitrarily hard by sending in parameters on the url, see the links in the page for examples.

And here's the code. Get it from svn via:
svn co http://bpgeo.googlecode.com/svn/trunk/nback

Sunday, April 27, 2008

levenshtein in cython

EDIT(2): fix markup for <char *> casts ... fix malloc (see comments. thanks Bao).
NOTE: using a kwarg for limit slows things down. setting that to a required arg and using calloc for m2 speed things up to nearly as fast as the pylevenshtein.

Well, it seems to be popular to code up the levenshtein. I actually have a use for this and wanted to practice some Cython, so I've written a version. I used bearophile's recipe, wikibooks, this (from k4st on reddit) and this for reference. It follows bearophile's code closely, using only O(m) space instead of O(mn).


cdef extern from "stdlib.h":
ctypedef unsigned int size_t
size_t strlen(char *s)
void *malloc(size_t size)
void free(void *ptr)
int strcmp(char *a, char *b)


cdef inline size_t imin(int a, int b, int c):
if a < b:
if c < a:
return c
return a
if c < b:
return c
return b


cpdef int levenshtein(char *a, char *b, int limit=100):
cdef int m = strlen(a), n = strlen(b)
cdef char *ctmp
cdef int i = 0, j = 0, retval
cdef int achr, bchr

if strcmp(a, b) == 0:
return 0

if m > n:
ctmp = a;
a = b;
b = ctmp;
#a, b = b, a
m, n = n, m

# short circuit.
if n - m >= limit:
return n - m

cdef char *m1 = <char *>malloc((n + 2) * sizeof(char))
cdef char *m2 = <char *>malloc((n + 2) * sizeof(char))

for i from 0 <= i <= n:
m1[i] = i
m2[i] = 0

for i from 0 <= i <= m:
m2[0] = i + 1
achr = a[i]
for j from 0 <= j <= n:
bchr = b[j]
if achr == bchr:
m2[j + 1] = m1[j]
else:
m2[j + 1] = 1 + imin(m2[j], m1[j], m1[j + 1])

m1, m2 = m2, m1

retval = m1[n + 1]
free(m2)
free(m1)
return retval

I believe that's correct (if not, let me know!), it matches output from other versions (which i used fort testing) and it doesn't leak memory, so I must have done the malloc/free correctly despite my lack of C-fu.
Again, I'm not sure, but I think that's a pretty good example of how to mix python and C with cython as it's not much longer than the other python versions and pretty readable. And, it's very fast, it does 500K iterations of:
levenshtein('i ehm a gude spehlar', 'i am a good speller')
in 2.5 seconds.
bearophile's pysco-ed editDistanceFast took 3 minutes, 31 seconds to do the same (which is why I started this project...).

Literally, just before I was to post this, I found pylevenshtein which is even faster, does the 500K in 1.5 seconds. ho hum. It's doing more, handling unicode, and apparently doing some optimizations.... So, use that instead! -- and contribute a test-suite.
Next, I think I'll try a variant that allows transpositions (Damerau) and/or implement the BK-Tree, again cribbing off bearophile's recipe.


for reference, here are the contents of the setup.py to build the module.

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup( name = 'levenshtein',
ext_modules=[ Extension("levenshtein", sources=["levenshtein.pyx"] ), ],
cmdclass = {'build_ext': build_ext})