by @RyanKeisler January 18, 2014
With the launch of small, low-cost imaging satellites from the likes of SkyBox and Planet Labs, the amount of publicly accessible satellite imagery is about to explode. Unless you're interested in a very specific location, manually analyzing these images is going to be difficult; you need some machine vision to do the heavy lifting for you.
For this weekend hack I decided to look for swimming pools in Austin, TX. Swimming pools because they're easy, and Austin because it's a city I know well. Code available here.
CLICK HERE FOR FULL-SCREEN VERSION.
Hey it works - The "pools" that the algorithm finds are usually real pools. No bogus pools show up in the big undeveloped areas, like the Barton greenbelt. Good.
West - There are way more pools in west Austin than in the other parts of town. The "west lake hills" neighborhood is the more affluent area of town, so it isn't surprising that there are more pools out there.
Wow - I was a little surprised at the degree of opulence of some of the homes-with-pools in west Austin: private tennis courts, putting greens, and my personal favorite, one's very own garden labyrinth.
Downtown - I wasn't expecting to see very many pools downtown, because, well, you don't see any pools when you're walking around there. But there are actually quite a few, many hiding on top of high-rise condo buildings.
Not quite right - The classifier has some interesting failures, like this water-treatment plant or this covered airport parking.
Takeaway - This was fun. I was glad to see that some relatively simple machine learning could be used to find objects in satellite imagery. Feedback welcome through rkeisler@gmail.com or @RyanKeisler.
First I defined a big rectangle centered on downtown Austin. After learning how to convert from latitude/longitude/zoom to the X/Y/Z tiling scheme, I downloaded 66,000 images from Mapbox (=66k urlretrieve
calls). Satellite imagery isn't available in the free Mapbox plan, so I had to cough up $5.
I chose to search for swimming pools because they're really easy to find. There just aren't a lot of other bus-sized, aquamarine blobs in these images.
The algorithm is pretty simple, and the basic idea was to hand-engineer features that correspond to contiguous chunks of blue pixels. More specifically,
ExtraTreesClassifier
(a random forest variant) from the amazing scikit-learn
machine learning package, because it's easy to use and robust to outlying data.Now that I have a list of (lat, long) for a bunch of pools, I need a way to visualize them. I wanted to show the locations of the pools on top of the original satellite imagery, and so Mapbox's TileMill was the obvious way to go. The most time-consuming (but ultimately satisfying) part of this process was making the size and transparency of the markers dependent on the zoom level. When zoomed out, the cirlces overlap and have the effect of a heatmap. When zoomed in, the circles are transparent and show the approximate location of the pool.
rkeisler.github.io