Tag: Chicago Cityscape

How I used ST_ClusterDBSCAN to locate clusters of multiple, similar parcels

Alternative headline: A practical example of how to use ST_ClusterDBSCAN to find similar real estate properties.

Oftentimes a developer wants to acquire several adjacent lots for a single redevelopment. Each standard sized lot in Chicago is about 3,125 square feet (25 feet wide and 125 feet deep). Because of downzoning in 2004, and since, the zoning rules for many lots allow only about 3-4 dwelling units each. Multiple lots are required to develop buildings with 6-9 dwelling units, which is a sweet spot in Chicago for design and avoiding having to get an upzone.

Chicago Cityscape has long had Property Finder, a tool to locate parcels that meet exacting specifications given existing lot size, current zoning district, distance to transit, and other criteria.

Now, Chicago Cityscape can locate parcels that are adjacent or near each other that all meet the user’s specified criteria (what the website calls “filters”). This is possible because of the PostGIS function ST_ClusterDBSCAN.

ST_ClusterDBSCAN considers all geospatial features in your result set (whatever matches the WHERE clause) and assigns them to a cluster ID according to two inputs: minimum cluster size, and maximum distance each feature can be from any other feature in order to be considered in the same cluster as that other feature.

The function can also assign a feature with a cluster ID of NULL, indicating that the feature did not meet the clustering criteria and is alone.

Show me what that looks like

Chicago Cityscape gives the user three options to cluster: Small, compact clusters with at least 3 properties each; small, compact clusters with at least 5 properties each; large, loose clusters with at least 10 properties each.

Additionally, Chicago Cityscape lets the user choose between showing parcels that weren’t found in a cluster, or hiding parcels that weren’t found in a cluster. The reason to show parcels that weren’t found in a cluster is to visualize where there are and aren’t clusters of parcels in the same map.

A map of Chicago’s Near West Side community area is shown with clusters of vacant lots. The “show all properties” mode is used, which shows clusters with a thick, black outline. Properties that were not in a cluster are still shown but without the thick black outline (enlarge the photo to see the difference).

Sample query

This query looks at all of the vacant lots within 1 mile of the intersection of Washington Boulevard and Karlov Avenue in the West Garfield Park community area of Chicago. The query looks for clusters of at least 3 features (“minpoints”) that are no more than 25 feet apart (“eps”). (The data are projected in Illinois StatePlane East Feet, rather than a projection that’s in meters because it’s easier for me to work with feet.)

I posted another sample query below that’s used to exclude all of the features that were not assigned to a cluster.

SELECT pin14, ST_ClusterDBSCAN(geom, eps := 25, minpoints := 3) over () AS cid, geom
FROM parcels
WHERE property_class = '1-00'
	AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-87.7278 41.8819)', 4326), 3435),
           5280)

The screenshot below shows clusters of vacant lots that resulted from the query above. The parcels symbolized in a gray gradient were not assigned to a cluster. Notice how clusters will form across the alleys but not across streets; this is because the streets are wider than 25 feet but most alleys are only 16 feet wide.

The map shows various groups (clusters) of vacant properties in West Garfield Park. Each cluster is symbolized in QGIS using a different color. Properties that are not in a cluster are symbolized by a gray gradient.

Exclusion sample query

This query is the same as above except that a Common Table Expression (CTE) is used (CTEs have the “WITH” keyword at the beginning) to create a subquery. The “WITH” subquery is the one that clusters the parcels and the following query (“SELECT *”) throws out any features returned by the subquery that don’t have a cluster ID (the “cid” field).

with parcels as (
SELECT pin14, ST_ClusterDBSCAN(geom, eps := 25, minpoints := 3) over () AS cid, geom
FROM parcels
WHERE property_class = '1-00'
	AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-87.7278 41.8819)', 4326), 3435),
           5280)
) select * 
from parcels where cid is not null;

I would also recommend Dan Baston’s blog post from six years ago which has more commentary and explanation, and additional examples of how to use the function.

Chicago Crash Browser is back

ChicagoCrashes dot org was, for many years, the only source for people to get information about traffic crashes in Chicago. I started it in 2011.

Chicago Crash Browser v0.2
A screenshot of Chicago Crash Browser v0.2 showing what the website looked like on December 30, 2011.

It was updated annually with data from two years ago, because of how the Illinois Department of Transportation processed the reports from all over the state. I shut it down because it had outdated code, I was maintaining it in my free time, and I didn’t want to update the code or spend all the time every year integrating the new data.

In 2015, the Chicago Police Department started testing an electronic crash reporting system in some districts that meant police officers could write reports and they would immediately show up in a public database (in the city’s data portal). The CPD expanded this to all districts in September 2017. (A big caveat to using the new dataset is that it has citywide data for only four and a half years.)

Since then, whenever someone asked me for crash data (mostly from John to illustrate Streetsblog Chicago articles), I would head to the data portal and grab data from just the block or intersection where someone had recently been injured or killed. I would load the traffic crash data into QGIS and visualize it. I found this also to be painstaking.

Now, with renewed attention on the common and unfixed causes of KSIs (“industry” term for killed or seriously injured) that we’re seeing repeatedly across Chicago – read about the contributing cause of Gerardo Marciales’s death – I decided to relaunch a version of Chicago Crash Browser.

The new version doesn’t have a name, because it’s part of the “Transportation Snapshot” in Chicago Cityscape, the real estate information platform I operate. It’s also behind a paywall, because that’s how Chicago Cityscape is built.

I wanted to make things a lot easier for myself this round and it comes with a lot of benefits:

  • Explore all crash reports in a given area, whether that’s one you draw yourself or predefined in the Cityscape database.
  • Quickly filter by crash type (bicyclist, pedestrian, etc.) and injury severity.
  • Download the data for further analysis.
A screenshot of a map and data table visualizing and describing traffic crash reports in Columbus Park.
What the crash data looks like within Chicago Cityscape.

How to access the Chicago Crash Browser

The crash data requires a Cityscape membership. I created a new tier of membership that cannot be signed up – I must grant it to you. It will give you access only to Transportation Snapshots.

  • Create a free account on Chicago Cityscape. The site uses only social networks for creating accounts.
  • Mention or DM me on Twitter, @stevevance, saying you’d like access to the crash data. Tell me what your email you used to create an account on Chicago Cityscape.
  • I’ll modify your membership to give you access to the “transportation tier” and tell you to sign out and sign back in to activate it.

Once you’re in, this video shows you how to draw a “Personal Place” and explore the traffic crash data there. Text instructions are below.

  1. From the Chicago Cityscape homepage, click on “Maps” in the menu bar and then click “Draw your own map”.
  2. On the “Personal Place” page that appears with a large map, decide which shape you’d like to draw: a circle with a radius that you specify (good for intersections), a square or rectangle (good for street blocks), or an arbitrary polygon (good for winding streets in parks). Click the shape and draw it according to the onscreen instructions. For intersections I recommend making the circle 150 feet for small intersections and 200 feet for long intersections; this is because intersections have an effect on driving beyond the box.
  3. Once you’ve completed drawing the shape, a popup window appears with the button to “view & save this Personal Place”. Click that button and a new browser tab will open with something called a “Place Snapshot”.
  4. In the Place Snapshot enter a name for your Personal Place and click the “Save” button.
  5. Scroll down and, under the “Additional Snapshots” heading, click the link for “Transportation & Jobs Snapshot”; a new browser tab will open.
  6. In Transportation Snapshot, scroll down and look for “Traffic crashes”. You’ve made it to the new Chicago Crash Browser.

How to visualize the density of point data in a grid

A common way to show the distribution of places (like grocery stores) is to use a heat map. The map will appear “hotter” where there are many grocery stores and “colder” where there are few grocery stores. This kind of map can be useful to show gaps in distribution or a neighborhood that has a lot of grocery stores.

One issue with that kind of heat map is that the coverage areas change their shape and color if you zoom in, since the algorithm that clusters or determines what’s “nearby” or dense has fewer locations to analyze.

I prefer to use grids in the shape of square tiles, since Chicago is grid-oriented city and the vast majority of our streets and our routes move along east-west and north-south lines. The map above shows the location of subjects and topics of news articles in the Chicago Cityscape database.

I use PostGIS to set up most of my spatial data before visualizing it in QGIS.

This tutorial shows the two steps to using PostGIS to (1) create a grid based on an existing area (polygon), (2) assigning each point location to a tile in that grid and counting the number of locations in that tile.

If you don’t have PostGIS installed, you should install it if you work with spatial data a lot. It is much, much faster at performing most of the tasks you use QGIS or ArcGIS to perform. Both QGIS and ArcGIS can read and edit data stored in PostGIS.

Additionally, there is a function within QGIS that can create grids, and another function that can do comparisons by location and count/summarize, so all of this can be done without PostGIS.

For this tutorial, you will need a single polygon with which to create the grid. I used the boundary of the City of Chicago limits.

  1. Create a grid based on an existing area

1.a. Add a new function to PostGIS

To create a grid, you need a function that draws the tiles based on the polygon. I got this from The Spatial Database Advisor.

-- Create required type
DROP   TYPE IF EXISTS T_Grid CASCADE;
CREATE TYPE T_Grid AS (
  gcol  int4,
  grow  int4,
  geom geometry
);
-- Drop function is exists
DROP FUNCTION IF EXISTS ST_RegularGrid(geometry, NUMERIC, NUMERIC, BOOLEAN);
-- Now create the function
CREATE OR REPLACE FUNCTION ST_RegularGrid(p_geometry   geometry,
                                          p_TileSizeX  NUMERIC,
                                          p_TileSizeY  NUMERIC,
                                          p_point      BOOLEAN DEFAULT TRUE)
  RETURNS SETOF T_Grid AS
$BODY$
DECLARE
   v_mbr   geometry;
   v_srid  int4;
   v_halfX NUMERIC := p_TileSizeX / 2.0;
   v_halfY NUMERIC := p_TileSizeY / 2.0;
   v_loCol int4;
   v_hiCol int4;
   v_loRow int4;
   v_hiRow int4;
   v_grid  T_Grid;
BEGIN
   IF ( p_geometry IS NULL ) THEN
      RETURN;
   END IF;
   v_srid  := ST_SRID(p_geometry);
   v_mbr   := ST_Envelope(p_geometry);
   v_loCol := trunc((ST_XMIN(v_mbr) / p_TileSizeX)::NUMERIC );
   v_hiCol := CEIL( (ST_XMAX(v_mbr) / p_TileSizeX)::NUMERIC ) - 1;
   v_loRow := trunc((ST_YMIN(v_mbr) / p_TileSizeY)::NUMERIC );
   v_hiRow := CEIL( (ST_YMAX(v_mbr) / p_TileSizeY)::NUMERIC ) - 1;
   FOR v_col IN v_loCol..v_hiCol Loop
     FOR v_row IN v_loRow..v_hiRow Loop
         v_grid.gcol := v_col;
         v_grid.grow := v_row;
         IF ( p_point ) THEN
           v_grid.geom := ST_SetSRID(
                             ST_MakePoint((v_col * p_TileSizeX) + v_halfX,
                                          (v_row * p_TileSizeY) + V_HalfY),
                             v_srid);
         ELSE
           v_grid.geom := ST_SetSRID(
                             ST_MakeEnvelope((v_col * p_TileSizeX),
                                             (v_row * p_TileSizeY),
                                             (v_col * p_TileSizeX) + p_TileSizeX,
                                             (v_row * p_TileSizeY) + p_TileSizeY),
                             v_srid);
         END IF;
         RETURN NEXT v_grid;
     END Loop;
   END Loop;
END;
$BODY$
  LANGUAGE plpgsql IMMUTABLE
  COST 100
  ROWS 1000;

The ST_RegularGrid function works in the same projection as your source data.

1.b. Create a layer that has all of the tiles for just the Chicago boundary

--This creates grids of 1,320 feet square (a 2x2 block size in Chicago)
SELECT gcol, grow, geom
 into b_chicagoboundary_grid_1320squared
 FROM ST_RegularGrid((select geom from chicagoboundary where gid = 1), 1320, 1320,FALSE);

In that query, “1320” is a distance in feet for both the X and Y planes, as the “chicagoboundary” geometry is projected in Illinois StatePlane FIPS East (Feet) (EPSG/SRID 3435).

2. Assign each point location to a tile in that grid and count the number of locations in each tile

Now you’ll need a table that has POINT-type geometries in it. For the map in this tutorial, I used a layer of location-based news articles that are used in Chicago Cityscape to highlight local developments.

SELECT grid.id, count(*) as count, grid.geom
INTO news_grid
FROM news_articles, b_chicagoboundary_grid_1320squared AS grid
WHERE st_intersects(news_articles.geom, grid.geom)
GROUP by grid.id;

This query will result in a table with three columns:

  1. The ID of the tile, which is a primary key field.
  2. The number of news articles in that tile.
  3. The POLYGON geometry of that tile.
Look at these two maps (the one above, and the one below). The first map shows the whole city. The tiles are colored according to the number of news articles within the area of each tile. The darker the blue, the more news articles within that tile.
This map is zoomed in to the Woodlawn area. As you change scale (zoom in or zoom out), the size of the “heat” area (the size of each tile) doesn’t change – they are still 1,320 feet by 1,320 feet. The color doesn’t change either. The typical heat map doesn’t have these advantages.

A new map for finding COVID vaccination sites in Illinois

The State of Illinois map of COVID vaccination sites is pretty bad. 

Screenshot of the Illinois Department of Public Health map, taken February 14, 2021.

It’s slow (caused my browser tab to crash after a couple minutes), has misspelled county and city names, missing ZIP code digits, and cannot be searched by address. There are duplicate entries, too.

I made a new version of the state’s COVID vaccination sites map.

I didn’t make any COVID maps earlier because I didn’t want to spend the time to ensure that I understood the right and wrong ways to map disease, because people make decisions based on maps and I don’t want my maps to end up harming anyone. 

The new map of COVID vaccination sites on Chicago Cityscape.

Aside from the state website’s usability issues, I’m very disappointed that there is zero data about COVID in the state’s #opendata portal.


These cities and counties have the most COVID vaccination sites, according to the IDPH’s dataset. 

For the top 10 or so, it seems to correlate with population. Except Skokie has 7 sites, and Evanston has 4, despite Evanston having 10,000 more residents. Nearly 100% of Illinois is within 60 minutes driving of the current COVID vaccination sites. (More are coming, at least in Chicago.) 

Nearly 100% of Illinois is within 60 minutes driving of the current COVID vaccination sites. (More are coming, at least in Chicago.)

And a lot of Illinois is still within 45 minutes driving of the current COVID vaccination sites. Really big gaps in geography appear at the 30 minutes driving threshold.

A map of Illinois showing 30 minute driving areas around each of the 862 COVID vaccination sites.

I’m working with some people to show access via transit. This is super important. I predict that upwards of 75 percent of Chicagoans will be able to access a vaccination site or two within 45 minutes and 100 percent within 60 minutes.

Here’s another shortcoming of the state’s map: Each site’s unique ID is not persistent, making it difficult to compare one day’s list to the following day’s list. I got around that by making a “hash” of each vaccination site and comparing between two versions.

The map has been updated once since I started. The “hash” creates a unique ID based on the attributes of each vaccination site (name, address, city, county, ZIP code). Any time one of those attributes changes, the hash will also change and thus I can more easily find new or modified vaccination sites.

Is it possible for us to “greenline” neighborhoods?

(I don’t mean extending the Green Line to its original terminal, to provide more transportation options in Woodlawn.)

Maps have been used to devalue neighborhoods and to excuse disinvestment. There should be maps, and narratives, to “greenline” – raise up – Chicago neighborhoods.

The Home Owners’ Loan Corporation “residential lending security” maps marked areas based on prejudicial characteristics and some objective traits of neighborhoods to assess the home mortgage lending risk. (View the Cook County maps.) The red and yellow areas have suffered almost continuously since the 1930s, and it could be based on the marking of these neighborhoods as red or yellow (there is some debate about the maps’ real effects).

The Home Owners’ Loan Corporation and its local consultants (brokers and appraisers, mostly) outlined areas and labeled them according to objective and subjective & prejudicial criteria in the 1930s. Each area is accompanied by a data sheet and narrative description. The image is a screenshot of the maps as hosted and presented on Chicago Cityscape.

The idea of “greenlining”

I might be thinking myopically, but what would happen if we marked *every* neighborhood in green, and talked about their strengths, and any historical and current disinvestment – actions that contribute to people’s distressed conditions today?

One aspect of this is a form of affirmative marketing – advertising yourself, telling your own story, in a more positive way than others have heard about you in the past.

In 1940, one area on the Far West Side of Chicago, in the Austin community area, was described as “Definitely Declining”, a “C” grade, like this:

This area is bounded on the north by Lake St., on the south by Columbus Park, and on the west by the neighboring village of Oak Park. The terrain is flat and the area is about 100% built up. There is heavy traffic along Lake St., Washington Blvd. Madison St., Austin Ave. (the western boundary) and Central Ave. (the eastern boundary).

High schools, grammar schools, and churches are convenient. Residents shop at fine shopping center in Oak Park. There are also numerouss small stores along Lake St., and along Madison St. There are many large apartment buildings along the boulevards above mentioned, and these are largely occupied by Hebrew tenants. As a whole the area would probably be 20-25% Jewish.

Some of this migration is coming from Lawndale and from the southwest side of Chicago. Land values are quite high due to the fact that the area is zoned for apartment buildings. This penalizes single family occupancy because of high taxes based on exclusive land values, which are from $60-80 a front foot, altho one authority estimates them at $100 a front foot. An example of this is shown where HOLC had a house on Mason St. exposed for sale over a (over) period of two years at prices beginning at $6,000 and going down to $4,500. it was finally sold for $3,800. The land alone is taxed based on a valuation exceeding that amount. This area is favored by good transportation and by proximity to a good Catholic Church and parochial school.

There are a few scattered two flats in which units rent for about $55. Columbus Park on the south affords exceptional recreational advantages. The Hawthorne Building & Loan, Bell Savings Building & Loan, and Prairie State Bank have loaned in this area, without the FHA insurance provision. The amounts are stated to be up to 50% and in some cases 60%, of current appraisals.

Age, slow infiltration, and rather indifferent maintenance have been considered in grading this area “C”.

Infiltration is a coded reference to people of color, and Jews.

My questions about how to “greenline” a neighborhood

  1. How would you describe this part of Austin today to stand up for the neighborhood and its residents, the actions taken against them over decades, and work to repair these?
  2. How do you change the mindset of investors (both small and large, local and far) to see the advantages in every neighborhood rather than rely on money metrics?
  3. What other kinds of data can investors use in their pro formas to find the positive outlook?
  4. What would these areas look like today if they received the same level of investment (per square mile, per student, per resident, per road mile) as green and blue areas? How great was the level of disinvestment from 1940-2018?

In the midst of writing this, Paola Aguirre pointed me to another kind of greenlining that’s been proposed in St. Louis. A new anti-segregation report from For the Sake of All recommended a “Greenlining Fund” that would pay to cover the gap between what the bank is appraising a house for and what the sales price is for a house, so that more renters and Black families can buy a house in their neighborhoods.

That “greenlining” is a more direct response to the outcome of redlining: It was harder to get a mortgage in a red area. My idea of greenlining is to come up with ways to say to convince people who have a hard time believing there are qualities worth investing in that there they are people and places worth investing in.


The Digital Scholarship Lab at the University of Richmond digitized the HOLC maps and published them on their Mapping Inequality website as well as provided the GIS data under a Creative Commons license.