Alternative headline: A practical example of how to use ST_ClusterDBSCAN to find similar real estate properties.

Oftentimes a developer wants to acquire several adjacent lots for a single redevelopment. Each standard sized lot in Chicago is about 3,125 square feet (25 feet wide and 125 feet deep). Because of downzoning in 2004, and since, the zoning rules for many lots allow only about 3-4 dwelling units each. Multiple lots are required to develop buildings with 6-9 dwelling units, which is a sweet spot in Chicago for design and avoiding having to get an upzone.

Chicago Cityscape has long had Property Finder, a tool to locate parcels that meet exacting specifications given existing lot size, current zoning district, distance to transit, and other criteria.

Now, Chicago Cityscape can locate parcels that are adjacent or near each other that all meet the user’s specified criteria (what the website calls “filters”). This is possible because of the PostGIS function ST_ClusterDBSCAN.

ST_ClusterDBSCAN considers all geospatial features in your result set (whatever matches the WHERE clause) and assigns them to a cluster ID according to two inputs: minimum cluster size, and maximum distance each feature can be from any other feature in order to be considered in the same cluster as that other feature.

The function can also assign a feature with a cluster ID of NULL, indicating that the feature did not meet the clustering criteria and is alone.

Show me what that looks like

Chicago Cityscape gives the user three options to cluster: Small, compact clusters with at least 3 properties each; small, compact clusters with at least 5 properties each; large, loose clusters with at least 10 properties each.

Additionally, Chicago Cityscape lets the user choose between showing parcels that weren’t found in a cluster, or hiding parcels that weren’t found in a cluster. The reason to show parcels that weren’t found in a cluster is to visualize where there are and aren’t clusters of parcels in the same map.

A map of Chicago’s Near West Side community area is shown with clusters of vacant lots. The “show all properties” mode is used, which shows clusters with a thick, black outline. Properties that were not in a cluster are still shown but without the thick black outline (enlarge the photo to see the difference).

Sample query

This query looks at all of the vacant lots within 1 mile of the intersection of Washington Boulevard and Karlov Avenue in the West Garfield Park community area of Chicago. The query looks for clusters of at least 3 features (“minpoints”) that are no more than 25 feet apart (“eps”). (The data are projected in Illinois StatePlane East Feet, rather than a projection that’s in meters because it’s easier for me to work with feet.)

I posted another sample query below that’s used to exclude all of the features that were not assigned to a cluster.

SELECT pin14, ST_ClusterDBSCAN(geom, eps := 25, minpoints := 3) over () AS cid, geom
FROM parcels
WHERE property_class = '1-00'
	AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-87.7278 41.8819)', 4326), 3435),
           5280)

The screenshot below shows clusters of vacant lots that resulted from the query above. The parcels symbolized in a gray gradient were not assigned to a cluster. Notice how clusters will form across the alleys but not across streets; this is because the streets are wider than 25 feet but most alleys are only 16 feet wide.

The map shows various groups (clusters) of vacant properties in West Garfield Park. Each cluster is symbolized in QGIS using a different color. Properties that are not in a cluster are symbolized by a gray gradient.

Exclusion sample query

This query is the same as above except that a Common Table Expression (CTE) is used (CTEs have the “WITH” keyword at the beginning) to create a subquery. The “WITH” subquery is the one that clusters the parcels and the following query (“SELECT *”) throws out any features returned by the subquery that don’t have a cluster ID (the “cid” field).

with parcels as (
SELECT pin14, ST_ClusterDBSCAN(geom, eps := 25, minpoints := 3) over () AS cid, geom
FROM parcels
WHERE property_class = '1-00'
	AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-87.7278 41.8819)', 4326), 3435),
           5280)
) select * 
from parcels where cid is not null;

I would also recommend Dan Baston’s blog post from six years ago which has more commentary and explanation, and additional examples of how to use the function.