AUSTIN POOL MAP

by @RyanKeisler January 18, 2014

With the launch of small, low-cost imaging satellites from the likes of SkyBox and Planet Labs, the amount of publicly accessible satellite imagery is about to explode. Unless you're interested in a very specific location, manually analyzing these images is going to be difficult; you need some machine vision to do the heavy lifting for you.

For this weekend hack I decided to look for swimming pools in Austin, TX. Swimming pools because they're easy, and Austin because it's a city I know well. Code available here.

the map

CLICK HERE FOR FULL-SCREEN VERSION.

comments

Hey it works - The "pools" that the algorithm finds are usually real pools. No bogus pools show up in the big undeveloped areas, like the Barton greenbelt. Good.
West - There are way more pools in west Austin than in the other parts of town. The "west lake hills" neighborhood is the more affluent area of town, so it isn't surprising that there are more pools out there.
Wow - I was a little surprised at the degree of opulence of some of the homes-with-pools in west Austin: private tennis courts, putting greens, and my personal favorite, one's very own garden labyrinth.
Downtown - I wasn't expecting to see very many pools downtown, because, well, you don't see any pools when you're walking around there. But there are actually quite a few, many hiding on top of high-rise condo buildings.
Not quite right - The classifier has some interesting failures, like this water-treatment plant or this covered airport parking.
Takeaway - This was fun. I was glad to see that some relatively simple machine learning could be used to find objects in satellite imagery. Feedback welcome through rkeisler@gmail.com or @RyanKeisler.

Details

Code available here.

image data

First I defined a big rectangle centered on downtown Austin. After learning how to convert from latitude/longitude/zoom to the X/Y/Z tiling scheme, I downloaded 66,000 images from Mapbox (=66k urlretrieve calls). Satellite imagery isn't available in the free Mapbox plan, so I had to cough up $5.

where are the pools?

I chose to search for swimming pools because they're really easy to find. There just aren't a lot of other bus-sized, aquamarine blobs in these images.

The algorithm is pretty simple, and the basic idea was to hand-engineer features that correspond to contiguous chunks of blue pixels. More specifically,

Randomly generate 60 colors: 50 shades of blue and 10 shades of non-blue.
For each color, and for each image,
- create a binary image, 1's for pixels that are "close" to this color, 0's otherwise,
- smooth the binary map,
- record the maximum and mean of the smoothed map.
That set of 60 (max, mean)'s are the features we'll train on. We've reduced the (256x256x3)-dimensional input space to a 120-dimensional feature space.
Hand-label a few percent of the images. This was 30 minutes of "no pool", "pool", ..., "no pool" - not quite as boring as it sounds.
Train a classifier on these features. The classifier outputs a probability that the image contains a pool in it. I went with the ExtraTreesClassifier (a random forest variant) from the amazing scikit-learn machine learning package, because it's easy to use and robust to outlying data.
Run the classifier on all 66,000 images.
Now we can predict the locations of all the pools in Austin.

visualization

Now that I have a list of (lat, long) for a bunch of pools, I need a way to visualize them. I wanted to show the locations of the pools on top of the original satellite imagery, and so Mapbox's TileMill was the obvious way to go. The most time-consuming (but ultimately satisfying) part of this process was making the size and transparency of the markers dependent on the zoom level. When zoomed out, the cirlces overlap and have the effect of a heatmap. When zoomed in, the circles are transparent and show the approximate location of the pool.

rkeisler.github.io
email
@RyanKeisler