numpy + GDAL = agoodle

It's been a while since I've posted anything geo-related. Just want to let folks know about agoodle a project that makes it easy to access raster geo files as numpy arrays (thanks to GDAL), and to query them with polygons. The simplest way to see it in action is to go to landsummary, a project Josh Livni and I worked on a couple years ago, and query the map with a polygon (click "Draw Polygon" on the right side of the map first). Then you'll see a nice summary and a pie-chart (due to some google-chart-api usage by Josh) of the land-use types in the polygon you queried. The backend of that is agoodle. The map is, of course, openlayers.

You can see the usage on the github page but basically, it'll be something like: which gives you a dictionary of raster grid code keys to the area values-- meaning, for each landclass, it tells you the sum area of cells of that class that fall inside the polygon. Note that summarize_wkt can also take keyword arguments for raster_epsg and poly_epsg, so you can query with a polygon that's in a different projection than the raster.

The library could use some work. There's sketchiness about "inside", so the sum of the areas in the returned dictionary will not match the area of the polygon. Internally, we use matplotlib's nxutils which does a very-fast point in poly test for every cell in the raster that's within the bounding box of the polygon. So any cell that passes that test will be included in the results--all or nothing, there's no corrections for the case where a cell is half in the polygon. This is not a problem unless the polygon is small relative to the size of the raster cells (or the perimeter of the query polygon is large). Patches Welcome.

Check out the code, we made gdal, numpy and nxutils do all the hard work, but it comes together pretty well. If there's other opensource projects out there that do this well, let me know and i'll link to them. I'm posting this because as far as I know, no others exist.

Comments

Popular posts from this blog

filtering paired end reads (high throughput sequencing)

python interval tree

needleman-wunsch global sequence alignment -- updates and optimizations to nwalign