### k-means clustering in scipy

it's fairly simple to do clustering of points with similar z-values in scipy:

import numpy

import matplotlib

matplotlib.use('Agg')

from scipy.cluster.vq import *

import pylab

pylab.close()

# generate some random xy points and

# give them some striation so there will be "real" groups.

xy = numpy.random.rand(30,2)

xy[3:8,1] -= .9

xy[22:28,1] += .9

# make some z vlues

z = numpy.sin(xy[:,1]-0.2*xy[:,1])

# whiten them

z = whiten(z)

# let scipy do its magic (k==3 groups)

res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)

# convert groups to rbg 3-tuples.

colors = ([([0,0,0],[1,0,0],[0,0,1])[i] for i in idx])

# show sizes and colors. each color belongs in diff cluster.

pylab.scatter(xy[:,0],xy[:,1],s=20*z+9, c=colors)

pylab.savefig('/var/www/tmp/clust.png')

## Comments

kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)

An explanation for this line would have been helpful for me:

zip(xy[:,0],xy[:,1],z)

numpy.array(zip(xy[:, 0], xy[:, 1], z))

is a weird (and slow) way of writing

numpy.column_stack([xy, z])

Anyway it's clearer in two steps:

data = numpy.column_stack([xy, z])

vq.kmeans2(data, 3)

where `data` is an array of 30 points in 3d where the x, y values are given by the original `xy` array and the z-values are the new `z` array. All together, it is passed to the kmeans algorithm.

Take a look here for an explanation:

Using scipy's kmeans2 function in python