Category: Data

A kludge to build a rental registry in Cook County 

Chicago should have a rental registry, a database of dwelling units that are rented to tenants, for at least two reasons:

  1. The city can know things about the rental units, including how much they cost, where they are, and if any are vacant and could be occupied if only people knew they were available and how to get in touch with the owner.
  2. The city can know who the owners are and contact them to issue citations or advise them, or fill out for them, emergency rental assistance during pandemics and other times of necessity.

Building and administering a rental registry from scratch would be very expensive – probably tens of millions to start and more than one million annually.

I propose a kludge that uses existing databases and modifies existing standard operating procedures amongst a small group of Cook County and Chicago agencies. A kludge is a workaround. It has other meanings and an uncertain etymology.

A 9-unit apartment building in Little Italy is undergoing renovation.

The kludge has four parts

1. Incorporate data about the number of units declared on Real Estate Transfer Tax forms (which in Cook and many other counties are transmitted to the Illinois Department of Revenue digitally).

There is already a city office that reviews or audits these forms looking for instances where the buyer or seller incorrectly claimed certain exemptions from RETT, because of how the city can lose revenue. That office can also enforce that the number of units was correctly entered on the form. 

2. For banks that hold city deposits, amend legislation to require that their newly issued or refinanced mortgages specify the number of units in the required submitted documentation. The ordinance that regulates banks that hold city deposits was amended a few years ago to require that they report how many loans they issue in Chicago for both commercial and residential properties.

Databases 1 and 2 are checks for each other. 

3. “Hire” the Cook County Assessor’s Office to create and operate the database for the unit count data from 1 and 2 (likely as an augmentation of their existing database).

The database would also store any data the CCAO collects through the commercial valuation data they obtain from third party sources as well as from the owners who volunteer it (Assessor Kaegi is already collecting and publicly publishing this information). 

At this point, with features 1, 2, and 3, we are assembling a pretty broad but incomplete record of where rental units are. It will be come more complete over time as properties transfer (sell) and the details of the transfer (sale), and the properties themselves, are recorded.

It doesn’t have a clue as to the rental prices

4. The Cook County Assessor’s Office creates new property classifications. Property classifications allow for the comparison of like buildings for the purpose of establishing assessed values for all properties that are not tax exempt.

One of the most common classifications in Chicago is “2-11”, for apartment buildings with two to six units. This means that, generally, the value of the ubiquitous two-flats and three-flats get compared to other each other and sometimes to four-flats, etc.

I suggest that there should be a few new property classifications, but I have only one idea so far: classify limited equity and Chicago Housing Trust properties differently. 

Bickerdike is one organization that built a lot of limited equity row houses and detached houses in the 1990s and 2000s but I am not aware of a publicly accessible database identifying them.

These houses represent permanently affordable housing and we should have a better system to track them!

This screenshot of part of a spreadsheet is the apartments data that the Cook County Assessor’s Office collected for the 2021 tax year. 

How broad is the kludge?

  • Using the Real Estate Transfer Tax data from 2022 Q1 to Q3, there were 3,550 buildings in Chicago having 22,217 units transferred. (I don’t know how many were arms length transactions, meaning they were sold to new owners.)
  • In the CCAO’s apartments data collected for the Rogers Park Township, there is semi-detailed information about 715 buildings that have seven or more apartments comprising 18,541 units. Details include the unit size breakdown by bedroom count.

Chicago has 556,099 rented dwelling units in buildings with two or more units (according to the ACS 2021 1-year estimate). In my limited analysis I’ve already found data about 7.4 percent of them, and that’s only for part of the city [1].


[1] There may also be duplicates between the buildings in the RETT database and the CCAO apartments dataset.

Creating a PostgreSQL PostGIS function to get around a DataTables Editor limitation

DataTables is a fantastic software that turns any HTML table into an interactive and editable one.

Chicago Cityscape uses DataTables on pretty much every page. DataTables also provides server-side processing for PHP to grab the right data from a database and show it to the user, 10 records at a time (the number of records can be changed by the user to show more records at a time).

Screenshot showing my new function, using the function, and the results.


One of the problems I’ve had to get around is that the DataTables Editor script recognizes a SELECT statement with only one function per field. If there’s a second function that’s manipulating a field in the SELECT statement then the ALIAS won’t work and the JavaScript will not be able to read and show the data properly.

I almost always use the two functions ST_AsGeoJSON and ST_Transform on my geographic data to (1) transform it from the SRID of 3435 to 4326 for displaying on web mercator maps, and (2) converting the GIS data into GeoJSON strings for easy transference to the Leaflet JavaScript library.

This is a known issue – see this thread on the DataTables forums – that Allan, the DataTables creator, has acknowledged and provided an alternative solution for.


It turns out that it’s easy to write a function that combines both functions.

CREATE OR REPLACE FUNCTION ST_TransformAsGeoJSON (field geometry, srid int =4326, simplify int =5)
  RETURNS text
RETURN ST_AsGeoJSON(ST_Transform(field, srid), simplify);

The code above is a function I call ST_TransformAsGeoJSON that combines the two functions I already described, and is flexible by letting the user specify in the arguments the table and field, the SRID to transform (reproject) to, and the simplify variable that can be used in ST_AsGeoJSON that tells it how many decimal points to use in coordinates.

Avoid “VARCHAR” columns in PostgreSQL [PGSQL Phriday #006]

This is a response to Grant.

This is one thing I wish I knew while learning PostgreSQL: Don’t ever use the VARCHAR column type. I have never encountered a situation where that would have been superior to using TEXT.

The problem with VARCHAR column type is that it can be a fixed-width column that doesn’t allow text longer than a certain number of characters. This maximum character length can be introduced by the person who creates the table and adds the column.

The maximum character length can also be created by a GIS program (specifically, QGIS and the ogr2ogr command line tool) as well as some online tools that can convert spreadsheets to SQL and even write the table definition and “CREATE TABLE” statement. In both cases, the program tried to guess the column type and seemed to default to VARCHAR(y character length) instead of TEXT.

The maximum character length is a problem because trying to insert more text than allowed into that field will encounter an error and the INSERT or UPDATE statement will fail. I think it’s also a bad idea to create these columns because it makes it expensive to change your mind later about what length of text will go into that field.

Other things I wish I knew earlier

  • Using transactions to test your query and its affect on your data so that you can easily undo any changes.
  • How and when to use indexes beyond the “btree” index type (for example, “gin” for array and JSONB columns).
  • Add “time_created_at” and “time_updated_at” columns to basically every dataset. This makes it easy to sort things later; sometimes when creating a new dataset it’s not always obvious how you might want to display or manipulate the data until days, weeks, or months later. Sorting data by “when was the information changed last” is a common use case for me.

How I used ST_ClusterDBSCAN to locate clusters of multiple, similar parcels

Alternative headline: A practical example of how to use ST_ClusterDBSCAN to find similar real estate properties.

Oftentimes a developer wants to acquire several adjacent lots for a single redevelopment. Each standard sized lot in Chicago is about 3,125 square feet (25 feet wide and 125 feet deep). Because of downzoning in 2004, and since, the zoning rules for many lots allow only about 3-4 dwelling units each. Multiple lots are required to develop buildings with 6-9 dwelling units, which is a sweet spot in Chicago for design and avoiding having to get an upzone.

Chicago Cityscape has long had Property Finder, a tool to locate parcels that meet exacting specifications given existing lot size, current zoning district, distance to transit, and other criteria.

Now, Chicago Cityscape can locate parcels that are adjacent or near each other that all meet the user’s specified criteria (what the website calls “filters”). This is possible because of the PostGIS function ST_ClusterDBSCAN.

ST_ClusterDBSCAN considers all geospatial features in your result set (whatever matches the WHERE clause) and assigns them to a cluster ID according to two inputs: minimum cluster size, and maximum distance each feature can be from any other feature in order to be considered in the same cluster as that other feature.

The function can also assign a feature with a cluster ID of NULL, indicating that the feature did not meet the clustering criteria and is alone.

Show me what that looks like

Chicago Cityscape gives the user three options to cluster: Small, compact clusters with at least 3 properties each; small, compact clusters with at least 5 properties each; large, loose clusters with at least 10 properties each.

Additionally, Chicago Cityscape lets the user choose between showing parcels that weren’t found in a cluster, or hiding parcels that weren’t found in a cluster. The reason to show parcels that weren’t found in a cluster is to visualize where there are and aren’t clusters of parcels in the same map.

A map of Chicago’s Near West Side community area is shown with clusters of vacant lots. The “show all properties” mode is used, which shows clusters with a thick, black outline. Properties that were not in a cluster are still shown but without the thick black outline (enlarge the photo to see the difference).

Sample query

This query looks at all of the vacant lots within 1 mile of the intersection of Washington Boulevard and Karlov Avenue in the West Garfield Park community area of Chicago. The query looks for clusters of at least 3 features (“minpoints”) that are no more than 25 feet apart (“eps”). (The data are projected in Illinois StatePlane East Feet, rather than a projection that’s in meters because it’s easier for me to work with feet.)

I posted another sample query below that’s used to exclude all of the features that were not assigned to a cluster.

SELECT pin14, ST_ClusterDBSCAN(geom, eps := 25, minpoints := 3) over () AS cid, geom
FROM parcels
WHERE property_class = '1-00'
	AND ST_DWithin(geom,
            ST_GeomFromText('POINT(-87.7278 41.8819)', 4326), 3435),

The screenshot below shows clusters of vacant lots that resulted from the query above. The parcels symbolized in a gray gradient were not assigned to a cluster. Notice how clusters will form across the alleys but not across streets; this is because the streets are wider than 25 feet but most alleys are only 16 feet wide.

The map shows various groups (clusters) of vacant properties in West Garfield Park. Each cluster is symbolized in QGIS using a different color. Properties that are not in a cluster are symbolized by a gray gradient.

Exclusion sample query

This query is the same as above except that a Common Table Expression (CTE) is used (CTEs have the “WITH” keyword at the beginning) to create a subquery. The “WITH” subquery is the one that clusters the parcels and the following query (“SELECT *”) throws out any features returned by the subquery that don’t have a cluster ID (the “cid” field).

with parcels as (
SELECT pin14, ST_ClusterDBSCAN(geom, eps := 25, minpoints := 3) over () AS cid, geom
FROM parcels
WHERE property_class = '1-00'
	AND ST_DWithin(geom,
            ST_GeomFromText('POINT(-87.7278 41.8819)', 4326), 3435),
) select * 
from parcels where cid is not null;

I would also recommend Dan Baston’s blog post from six years ago which has more commentary and explanation, and additional examples of how to use the function.

Chicago Crash Browser updates with stats, filters, and news article links

Today I’m adding a bunch of new features to the Chicago Crash Browser, which lives on Chicago Cityscape.

But first…special access is no longer required. Anyone can create a free Cityscape account and access the map. However, only those with special access or a Cityscape Real Estate Pro account will be able to download the data.

Five new features include:

  • Statistics that update weekly to summarize what happened in the past week: the number of crashes, the number of people killed in crashes, and the number of people with the two worst tiers of injuries. The statistics are viewable to everyone, including those without access to the crash browser.
screenshot of the new weekly statistics
The statistics will update every Sunday. The numbers may change throughout the week as Chicago police officers upload crash reports.
  • For data users, the crash record ID is viewable. The crash record ID links details about the same crash across the Chicago data portal’s three tables: Crashes, Vehicles, and People. My Chicago Crash Browser is currently only using the Crashes table. Click on the “More details” arrow in the first table column.
screenshot of the data table showing the crash record ID revealed.
The crash record ID is hidden by default but can be exposed. Use this ID to locate details in the data portal’s Vehicles and People tables.
  • Filter crashes by location. There are currently two location filters: (1) on a “Pedestrian Street” (a zoning designation to, over time, reduce the prevalence of car-oriented land uses and improve building design to make them more appealing to walk next to); (2) within one block of a CTA or Metra station, important places where people commonly walk to. Select a filter’s radio button and then click “Apply filters”.
  • Filter crashes by availability of a news article or a note. I intend to attach news articles to every crash where a pedestrian or bicyclist was killed (the majority of these will be to Streetsblog Chicago articles, where I am still “editor at large”. Notes will include explanations about data changes [1] (the “map editor” mentioned in some of the notes is me) and victims’ names.
screenshot of the two types of filters

After choosing a filter’s radio button click “Apply filters” and the map and data table will update.
  • Filter by hit and run status. If the officer filling out the crash report marked it as a hit and run crash, you can filter by choosing “Yes” in the options list. “No” is another option, as is “not recorded”, which means the officer didn’t select yes or no.
  • Search by address. Use the search bar inside the map view to center the map and show crashes that occurred within one block (660 feet) of that point. The default is one block and users can increase that amount using the dropdown menu in the filter.
screenshot of the map after the search by address function has been used
Use the search bar within the map view to show crashes near a specific address in Chicago.


[1] The most common data change as of this writing is when a crash’s “most severe injury” is upgraded from non-fatal to fatal, but the crash report in the city’s data portal does not receive that update. This data pipeline/publishing issue is described in the browser’s “Crash data notes” section.

The “map editor” (me) will change a crash’s “most severe injury” to fatal to ensure it appears when someone filters for fatal crashes. This change to the data will be noted.