AUSTIN POOL MAP

by @RyanKeisler January 18, 2014

With the launch of small, low-cost imaging satellites from the likes of SkyBox and Planet Labs, the amount of publicly accessible satellite imagery is about to explode. Unless you're interested in a very specific location, manually analyzing these images is going to be difficult; you need some machine vision to do the heavy lifting for you.

For this weekend hack I decided to look for swimming pools in Austin, TX. Swimming pools because they're easy, and Austin because it's a city I know well. Code available here.

the map

CLICK HERE FOR FULL-SCREEN VERSION.

CLICK HERE FOR FULL-SCREEN VERSION.

comments


Details

Code available here.

image data

First I defined a big rectangle centered on downtown Austin. After learning how to convert from latitude/longitude/zoom to the X/Y/Z tiling scheme, I downloaded 66,000 images from Mapbox (=66k urlretrieve calls). Satellite imagery isn't available in the free Mapbox plan, so I had to cough up $5.

where are the pools?

I chose to search for swimming pools because they're really easy to find. There just aren't a lot of other bus-sized, aquamarine blobs in these images.

The algorithm is pretty simple, and the basic idea was to hand-engineer features that correspond to contiguous chunks of blue pixels. More specifically,

  1. Randomly generate 60 colors: 50 shades of blue and 10 shades of non-blue.
  2. For each color, and for each image,
  3. That set of 60 (max, mean)'s are the features we'll train on. We've reduced the (256x256x3)-dimensional input space to a 120-dimensional feature space.
  4. Hand-label a few percent of the images. This was 30 minutes of "no pool", "pool", ..., "no pool" - not quite as boring as it sounds.
  5. Train a classifier on these features. The classifier outputs a probability that the image contains a pool in it. I went with the ExtraTreesClassifier (a random forest variant) from the amazing scikit-learn machine learning package, because it's easy to use and robust to outlying data.
  6. Run the classifier on all 66,000 images.
  7. Now we can predict the locations of all the pools in Austin.

visualization

Now that I have a list of (lat, long) for a bunch of pools, I need a way to visualize them. I wanted to show the locations of the pools on top of the original satellite imagery, and so Mapbox's TileMill was the obvious way to go. The most time-consuming (but ultimately satisfying) part of this process was making the size and transparency of the markers dependent on the zoom level. When zoomed out, the cirlces overlap and have the effect of a heatmap. When zoomed in, the circles are transparent and show the approximate location of the pool.

rkeisler.github.io
email
@RyanKeisler