Page 27 of 171

The genius of using ST_Subdivide to speed up PostGIS intersection comparisons

You should use ST_Subdivide to break up large shapes into smaller ones

This infographic visually compares the difference between running a PostGIS comparison query like ST_Intersects on a large shape versus a subdivided version of that large shape. Click to embiggen.

Hundreds of GIS intersection comparisons are completed every hour on Chicago Cityscape.*

People are looking at, say, a map of the South Shore community area. That “Place” page then grabs all of the building permits, building violations, business licenses, and other “feature layers” that are stored as points.

A classic “point in polygon” comparison is made using the ST_Intersects(place_geometry, permits_geometry) function.

This has worked very well for several years.

The problem

But as Chicago Cityscape handles larger shapes – they come from users drawing their own, large shapes, and from large shapes like the downtown Chicago area – this query doesn’t cut it.

Setting indexes on the geometry is imperative, but it’s not the end of the to optimize performance. That’s because the index of the geometry is a rectangular bounding box (which is also called an “envelope” in GIS) that contains the entire shape of the South Shore community area.

The downtown Chicago area, however, is not even the largest shape I have. That belongs to the new Place, “Neighborhood Opportunity Fund investment zones” (NOF). Combined, they cover 75 square miles of Chicago. Downtown is only 7.7 square miles.

After I added the NOF map and tested its Place page, it “crashed” my server, metaphorically speaking. The query to just count the number of building permits in the area would take about five minutes.

There had to be a better way; in the meantime, however, I divided the NOF map into the West and South sections. This hardly improved the counting time.

The solution

Thankfully, today, I saw a tweet from Paul Ramsey linking to his blog that linked to his slides from a recent presentation about the use of PostgreSQL to store and manipulate GIS data.

In it he explained how the ST_Subdividefunction worked. I’m going to demonstrate it using graphics from my own maps.

A normal intersection comparison, using ST_Intersects(place_geometry, permits_geometry) in a query creates a bounding box (envelope) around each geometry and quickly determines whether the two envelopes overlap. If they do, then it checks again to see if the actual geometries overlap. If they do, that data is returned as a response to your query.

When your two datasets are massive, like the NOF zones, which collectively cover 1/3rd of Chicago, and the building permits, which are found across the entire city…well, that led to the five minutes counting time.

Enter ST_Subdivide. To use it properly you would run it against your existing geometry and store the much smaller shapes, derived from the big shape, in a new table. I applied the function to all the 22,203 maps that Chicago Cityscape has and stored their unique IDs and subdivided geometries in a new “lookup” table.

Now, any time I want to compare the building permits against the NOF, the building permits are instead compared to the small shapes that were subdivided.

The query

Chicago Cityscape uses a single table (created as a materialized view) to combine all 22,203 maps. Each map is stored in a source table (for example, there’s a table to hold the 77 community areas) and the materialized view runs once a day to combine all of the maps in the source tables. This ensures our data is managed well: different source tables can hold different information, and the single table holds only the name, type, and geometry of the source tables, for faster comparison. Each entry in the single table also has a “slug”, its unique identifier.

Thus, the materialized view of the subdivided maps is created from the aforementioned single table, using this query:

create materialized view view_places_subdivided as
select gid || '_' || random() as gid, slug, st_subdivide(geom) as geom
from view_places;

The “gid” is designed to create a new unique ID field, as the slug field will be repeated for every subdivided of each map. A unique ID field is necessary if you want to refresh the materialized view concurrently (to allow for other queries to access the materialized view while it’s being refreshed).

* The results are cached for a few hours, because the feature layers change 1-2 times per day and at different times each day, so the limited duration cache accommodates that. Ideally I would code a way to invalidate the cache when the feature layer data is updated.

Update 12/31/19: ST_Subdivide will fail if your geometries have any or certain geometry errors (I don’t know if it’s any kind of error, or certain kinds of errors that make the function fail). Chicago Cityscape has over 37,000 features that ST_Subdivide is attempting to process, and there is a lot of room for error in managing that many features from dozens of sources.

Upzone the 606

Map of the single family-only zoning around the Bloomingdale Trail

The area in green only allows single-family houses to be built.

Something’s gotta give.

This is all of the land area within two blocks of the Bloomingdale Trail that allows only single-family housing to be built (view full map). This isn’t to say that multi-family housing doesn’t exist here; it definitely does, and there’s probably a handful of two-flats on a majority of the blogs.

All of the five parks of the 606 are within this two block radius, and 49.6 percent of the land allows only single-family housing to be built.

But why build a transportation corridor, a park, a new, expensive, public amenity, and not change the kind of housing – which often determines the kind of family and makeup of a household – that can afford to buy a home near here.

It’s already been shown that detached single-family housing prices have grown intensely the closer you get to the trail. That price growth has meant displacement for some, and “no chance to buy or build a house here” for many others.

There are still plenty of vacant lots within the mapped area; lots that should have a 2-4 unit building built on them, but where only a 1-unit building is allowed.

This map was made possible by the new Zoning Assessment tool on Chicago Cityscape. Read about it or use it now.

Yes, please, to Accessory Dwelling Units and adopting Vancouver’s policy

income property

There’s a couch house back there, providing an income opportunity for the owner of this single-family house. It’s hard to find photos of coach houses in Chicago because, given their position behind the house, it’s hard to see them from the street! Photo by Curtis Locke

I’m a huge opponent of how cities use zoning to keep densities very low and prevent people from moving into a neighborhood to enjoy high quality public schools and good access to transit. This is evidenced by many of my tweets about zoning analyses in Chicago over the last two weeks, and many blog posts I’ve written over the years.

I’m a proponent of Accessory Dwelling Units. In Chicago these are most commonly seen, in practice, as coach houses, which were built before most of us were born. ADUs, because they’re behind the primary building on a lot, are a nearly-hidden, low-impact way to provide affordable housing for a couple more people per lot without affecting the “character” of the neighborhood. And they generate rental income for the family that owns the primary building!

Coach houses, and ADUs, are illegal in Chicago. You’re allowed to keep the one you have, and it can be rented out to anyone else, so long as you don’t renovate it.

Bryn Davidson is an architect based in Vancouver, B.C., and his firm, Lanefab, designs ADUs in that city, a housing type that was legalized in 2009. He wrote an article in CityLab today and there are *so many quotable parts*.

In the article, Davidson offers five strategies for a city that’s developing an ADU legalization policy.

In the “Keep the approvals process simple” strategy, Davidson says that Vancouver’s policy means homeowners “don’t have to solicit feedback from neighbors”, adding, “The…is perhaps the most important. In North America we have a long history of granting neighbors truly extraordinary veto powers when it comes to adding new housing. Going forward, if we want to treat younger generations and renters more fairly, we need to stop trying to litigate housing on a lot-by-lot basis.”

This is one of the worst things about zoning today. Zoning is supposed to be the way that you tell property owners what they can expect to be able to build, and it’s a way for cities or residents to manage certain aspects about the way their area looks and who is living there.

But if everywhere in the city (cough Chicago cough) where people want to build is improperly zoned to begin with – for example, allowing only single-family houses near train stations in areas that have hundreds of apartment buildings that predate that zoning – you get a situation where so many property owners have to ask their city council member for a zoning change.

The next quotable is…the entire parking strategy. But here’s some choice parts:

  • “We argued at length about parking in Vancouver, but in the end, opted to require only one onsite parking space…”
  • “Some neighbors will get irate about the new competition for street parking, but here’s the counterpoint: If a neighbor is complaining about street parking, it’s because they’re using their garage…for something else”
  • “Either way, a lot of single-family-home residents are parking on city property for free while extracting extra value out of their private land.”

Chicago is experiencing gentrification, with rising property values and taxes in neighborhoods filled with households that can least afford it. Many of these households live in a single-family house – what do you think about giving them the opportunity to renovate and rent out an existing coach house in North Lawndale, or build a new coach house in Humboldt Park?

How much of the land within walking distance of a CTA station is zoned to allow multi-family housing?

I recently created the Zoning Assessment tool on Chicago Cityscape, which shows a map of aggregated zoning districts in a given community area, ward, or near a CTA or Metra station. Per Paul Angelone’s suggestion, you can now show the walk shed – the area within walking distance to the station, following the streets.

The maps in this post show where one can build apartments (including a simple and common two-flat) within a 15 minute walk to the Logan Square Blue Line station, which has 24-hour service. Try it yourself.

Thirty-one percent of the walk shed allows multi-family housing.

In a second version of the same map, I’ve marked in red the gaps in the zoning map. These are areas that are zoned to allow only single-family housing. That doesn’t make sense: The land near rapid transit stations should be much denser than the land away from the stations.

Sixty-four percent of the walk shed allows only single-family housing. The remaining five percent are planned developments (at least the Mega Mall is going to have a couple hundred dwelling units), manufacturing, and parks.

And if most of the block is already zoned to allow multi-family housing, why are these parcels skipped?

This is the same map as the one above, but with areas that allow only single-family housing marked in red (however, I skipped some areas to save myself time).

How it works
The walk shed boundaries are generated by Mapzen’s isochrone service. The Zoning Assessment map asks Mapzen for the polygon of a specified walk shed (walk or bike, 10 or 15 minutes), receives the polygon and sends that polygon to a custom API on the Cityscape server, which compares that to the server’s copy of Chicago’s latest zoning map. The comparison is then returned to the browser and replaces the default Zoning Assessment map.

Download GIS data from ArcGIS servers using a simple command

A lot of geospatial data (GIS) in the world is stored on ArcGIS MapServers, a part of the Esri “stack” of products that municipalities and other governments use to manage and publish GIS data.

A lot of people want that data, and not every government has enabled the ArcGIS data hub features that make it easier to get. For an example data hub, check out Cook County, Illinois.

If you have ArcGIS software on your Windows computer, then it can be pretty easy to plug in the map server URL and manipulate and extract the data.

For the rest of us who don’t have a very expensive license for that software, you can use a “command line” tool (written in Python) on any computer to download any layer of GIS data hosted on the ArcGIS MapServer and automatically convert it to GeoJSON.

Tutorial

You’ll need to install the Python package pyesridump, from the OpenAddresses GitHub repository, created by Ian Dees and other contributors.

Installing pyesridump is easy if you have pip installed, using the command pip install esridump.

The next thing you’ll need is the URL to a layer in a MapServer, and these are not easy to find.

Finding data to download

I can almost guarantee the county where you live has one. Before you continue, check to see if your county (or other jurisdiction) has the “open data portal” add-on to their ArcGIS stack.

A handful of suggestions:

  • Here are links to the open data portals enabled by Esri for Lake County, Illinois, and Broomfield County, Colorado.
  • A reader suggested looking for MapServers for jurisdictions around the world by looking through Esri’s portal of open data called ArcGIS Hub.
  • Use a search engine to search for the URL pattern that’s common to ArcGIS servers: paste “inurl:arcgis/rest/services” into a search engine. (You can also limit this search to a particular domain name, say, “chicago.gov”, by adding “site:chicago.gov” to the search query, making the full search query “inurl:arcgis/rest/services site:chicago.gov”.

Once you locate a dataset you want, you can find the MapServer URL under About>Data Source on the right side of the page.

I normally find them by looking at the HTML source code of a MapServer I already know about.

For this example I’ll use one of the GIS layers in the Cook County, Illinois, election service MapServer – here’s the layer for the Cook County commissioners districts.

Fetch the data

Once you have the URL the command is simple:

esri2geojson http://cookviewer1.cookcountyil.gov/ArcGIS/rest/services/cookElectnSrvc/MapServer/11 cookcounty_commissioners.geojson

  • The first term, esri2geojson tells your computer which program to load.
  • The second term is the URL of the MapServer URL.
  • The third term is the filename and location where you want to store the file. I prefer running the command “inside” the folder where I want the file to be stored. You can also specify a full path of the file. On a Mac this would look like ~/Users/username/Documents/GIS/projectname/cookcounty_commissioners.geojson

After you enter the command into your computer’s terminal, press enter. esri2geojson will report back once, after it finds and understands the MapServer URL you gave it. When it’s done, the command will “close” and your computer’s terminal will wait for the next command.

Do you have questions, or need some help? Leave a comment below.