Category: Research/Study

Who are the top property owners in Cook County

235 West Van Buren Street

There are several hundred condo units in the building at 235 W Van Buren Street, and each unit is associated with multiple Property Index Numbers (PIN). Photo by Jeff Zoline.

Several people have used Chicago Cityscape to try and find who owns a property. Since I’ve got property tax data for 2,013,563 individually billed pieces of property in Cook County I can help them research that answer.

The problem, though, is that the data, from the Cook County combined property tax  website, only shows who receives the property tax bills – the recipient – who isn’t always the property’s owner.

The combined website is a great tool. Property value info comes from the Assessor’s office. Sales data comes from the Recorder of Deeds, which is another, separately elected, Cook County government agency. Finally, the Treasurer’s office, a third agency, also with a separately elected leader, sends the bills and collects the tax.

The following is a list of the top 100 (or so) “property tax bill recipients” in Cook County for the tax years 2010 to 2014, ranked by the number of associated Property Index Numbers.

Many PINs have changed recipients after being sold or divided, and the data only lists the recipient at its final tax year. A tax bill for Unit 1401 at 235 W Van Buren St was at one time sent to “235 VAN BUREN, CORP” (along with 934 other bills), but in 2011 the PIN was divided after the condo unit was sold.

Of the 100 names, DataMade’s new “probablepeople” name parsing Python script identified 13 as persons. It mistakenly identified eight names as “Person”, leaving five people in the top 100.

The actual number is closer to 90, arrived at by combining 5 names that seem to be the same (using OpenRefine’s clustering function) and removing 5 “to the current taxpayer” and empty names. You’ll notice “Altus” listed four times (they’re based in Phoenix) and Chicago Title Land Trust, which can help property owners remain private, listed twice (associated with 643 PINs).

[table id=2 /]

Not all demolition activity is alike on Chicago’s North and South Sides

According to building permit activity in the past 30 days there are more demolitions on the North Side than on the South Side, but most of those will have new buildings.

According to building permit activity in the past 30 days there are more demolitions on the North Side than on the South Side, but most of those will have new buildings. Map by Chicago Cityscapen using Mapbox satellite imagery.

I review the new permits each day on Chicago Cityscape, both in the list and map views, to get a sense of what’s being built around the city. This keeps me informed so I can tweet interesting permits, respond to people’s questions several times a week, and even help out on a Moxie tour of Motor Row and the Cermak Green Line station by mention new construction and renovation permits I’ve seen pop up nearby.

This week I was poking around the Demolitions Tracker and saw that, not unusually, there were a lot of houses being demolished on the South Side. So I checked out a 30-day view of demolitions in the full-screen map that shows the entire city.

While the North Side has more demolitions going on than the South Side, most of them are part of teardowns – the demolished buildings are getting replaced or allowing for expansion. Demolitions on the South Side are much less likely to have associated new construction projects than the North Side.

Tina Fassett Smith tweeted back:

Use Turf to perform GIS functions in a web browser

Turf's merge function joins invisible buffers around each Divvy station into a single, super buffer.

Turf’s merge function joins invisible buffers around each Divvy station into a single, super buffer –all client-side, in your web browser.

I’m leading the development of a website for Slow Roll Chicago that shows the distribution of bike lane infrastructure in Chicago relative to key and specific demographics to demonstrate if the investment has been equitable.

We’re using GitHub to store code, publish meeting notes, and host discussions with the issues tracker. Communication is done almost entirely in GitHub issues. I chose GitHub over Slack and Google Groups because:

  1. All of our research and code should be public and open source so it’s clear how we made our assumptions and came to our conclusions (“show your work”).
  2. Using git, GitHub, and version control is a desirable skill and more people should learn it; this project will help people apply that skill.
  3. There are no emails involved. I deplore using email for group communication.*

The website focuses on using empirical research, maps, geographic analysis to tell the story of bike lane distribution and requires processing this data using GIS functions. Normally the data would be transformed in a desktop GIS software like QGIS and then converted to a format that can be used in Leaflet, an open source web mapping library.

Relying on desktop software, though, slows down development of new ways to slice and dice geographic data, which, in our map, includes bike lanes, wards, Census tracts, Divvy stations, and grocery stores (so far). One would have to generate a new dataset if our goals or needs changed .

I’ve built maps for images and the web that way enough in the past and I wanted to move away from that method for this project and we’re using Turf.js to replicate many GIS functions – but in the browser.

Yep, Turf makes it possible to merge, buffer, contain, calculate distance, transform, dissolve, and perform dozens of other functions all within the browser, “on the fly”, without any software.

After dilly-dallying in Turf for several weeks, our group started making progress this month. We have now pushed to our in-progress website a map with three features made possible by Turf:

  1. Buffer and dissolving buffers to show the Divvy station walk shed, the distance a reasonable person would walk from their home or office to check out a Divvy station. A buffer of 0.25 miles (two Chicago blocks) is created around each of the 300 Divvy stations, hidden from display, and then merged (dissolved in traditional GIS parlance) into a single buffer. The single buffer –called a “super buffer” in our source code – is used for another feature. Currently the projection is messed up and you see ellipsoid shapes instead of circles.
  2. Counting grocery stores in the Divvy station walk shed. We use the “feature collection” function to convert the super buffer into an object that the “within” function can use to compare to a GeoJSON object of grocery stores. This process is similar to the “select by location” function in GIS software. Right now this number is printed only to the console as we look for the best way to display stats like this to the user. A future version of the map could allow the user to change the 0.25 miles distance to an arbitrary distance they prefer.
  3. Find the nearest Divvy station from any place on the map. Using Turf’s “nearest” function and the Context Menu plugin for Leaflet, the user can right-click anywhere on the map and choose “Find nearby Divvy stations”. The “nearest” function compares the place where the user clicked against the GeoJSON object of Divvy stations to select the nearest one. The problem of locating 2+ nearby Divvy stations remains. The original issue asked to find the number of Divvy stations near the point; we’ll likely accomplish this by drawing an invisible, temporary buffer around the point and then using “within” to count the number of stations inside that buffer and then destroy the buffer.
Right-click the map and select "Find nearby Divvy stations" and Turf will locate the nearest Divvy station.

Right-click the map and select “Find nearby Divvy stations” and Turf will locate the nearest Divvy station.

* I send one email to new people who join us at Open Gov Hack Night on Tuesdays at the Mart to send them a link to our GitHub repository, and to invite them to a Dropbox folder to share large files for those who don’t learn to use git for file management.

Study recommends designing bike paths for bikes, not for sharing

running and bicycling on the lakefront trail

Some parts of the Chicago Lakefront Trail have a 2-foot wide side path designed for pedestrians but the study reviewed very cases where bicyclists crashed with pedestrians. It’s unknown if this side path results in fewer crashes than the parts of the Lakefront Trail without it. Photo: Eric Allix Rogers

The Active Transportation Alliance posted a link on Facebook to a new study [PDF] in the Open BMJ from December 2014, a free peer-reviewed “version” of the British Medical Journal, and said it “conclusively shows separated space reduces crashes and the severity of injuries when crashes occur”.

The posting was in the context that bicyclists and pedestrians using the Lakefront Trail should have separate paths. I agree, but there’s a problem in how they stated the study’s support for separation.

I wouldn’t say that the study “conclusively” shows that bicyclists would fare better on paths without pedestrians. The study reviewed fewer than 700 crashes in two Canadian cities. Only 11% of those crashes occurred on multi-use trails and only 5.9% of the crashes were with a pedestrian or animal (the two were grouped) – this sample size is too low from which to draw that kind of conclusion.

The study didn’t report the association (correlation) between crash severity and pedestrians. The authors didn’t even recommend that bicyclists and pedestrians have separate paths, but instead wrote “These results suggest an urgent need to provide bike facilities…that are designed [emphasis added] specifically for bicycling rather than for sharing with pedestrians”.

While I don’t discount the possibility that separating bicyclists and pedestrians in off-street corridors will reduce the number of crashes and injuries when those two user groups collide, it’s imprudent to call this study “conclusive” when linking it to the measure of crashes between bicyclists and pedestrians.

The study’s primary conclusion was that bicyclists crash more often when there are high slopes, fast moving cars, or no bike lanes and the authors recommended that bike infrastructure be built for bicycling. That’s a solid recommendation and I recommended that you sign Active Trans’s petition advocating for separate facilities in the North Lake Shore Drive reconstruction project.

Working with ZIP code data (and alternatives to using sketchy ZIP code data)

1711 North Kimball Avenue, built 1890

This building at 1711 N Kimball no longer receives mail and the local mail carrier would mark it as vacant. After a minimum length of time the address will appear in the United States Postal Service’s vacancy dataset, provided by the federal Department of Housing and Urban Development. Photo: Gabriel X. Michael.

Working with accurate ZIP code data in your geographic publication (website or report) or demographic analysis can be problematic. The most accurate dataset – perhaps the only one that could be called reliably accurate – is one that you purchase from one of the United States Postal Service’s (USPS) authorized resellers. If you want to skip the introduction on what ZIP codes really represent, jump to “ZIP-code related datasets”.

Understanding what ZIP codes are

In other words the post office’s ZIP code data, which they use to deliver mail and not to locate people like your publication or analysis, is not free. It is also, unbeknownst to many, a dataset that lists mail carrier routes. It’s not a boundary or polygon, although many of the authorized resellers transform it into a boundary so buyers can geocode the location of their customers (retail companies might use this for customer tracking and profiling, and petition-creating websites for determining your elected officials).

The Census Bureau has its own issues using ZIP code data. For one, the ZIP code data changes as routes change and as delivery points change. Census boundaries needs to stay somewhat constant to be able to compare geographies over time, and Census tracts stay the same for a period of 10 years (between the decennial surveys).

Understanding that ZIP codes are well known (everybody has one and everybody knows theirs) and that it would be useful to present data on that level, the Bureau created “ZIP Code Tabulation Areas” (ZCTA) for the 2000 Census. They’re a collection of Census tracts that resemble a ZIP code’s area (they also often share the same 5-digit identifiers). The ZCTA and an area representing a ZIP code have a lot of overlap and can share much of the same space. ZCTA data is freely downloadable from the Census Bureau’s TIGER shapefiles website.

There’s a good discussion about what ZIP codes are and aren’t on the GIS StackExchange.

Chicago example of the problem

Here’s a real world example of the kinds of problems that ZIP code data availability and comprehension: Those working on the Chicago Health Atlas have run into this problem where they were using two different datasets: ZCTA from the Census Bureau and ZIP codes as prepared by the City of Chicago and published on their open data portal. Their solution, which is really a stopgap measure and needs further review not just by those involved in the app but by a diverse group of data experts, was to add a disclaimer that they use ZCTAs instead of the USPS’s ZIP code data.

ZIP-code related datasets

Fast forward to why I’m telling you all of this: The U.S. Department of Housing and Urban Development (HUD) has two ZIP-code based datasets that may prove useful to mappers and researchers.

1. ZIP code crosswalk files

This is a collection of eight datasets that link a level of Census geography to ZIP codes (and the reverse). The most useful to me is ZIP to Census tract. This dataset tells you in which ZIP code a Census tract lies (including if it spans multiple ZIP codes). HUD is using data from the USPS to create this.

The dataset is documented well on their website and updated quarterly, going back to 2010. The most recent file comes as a 12 MB Excel spreadsheet.

2. Vacant addresses

The USPS employs thousands of mail carriers to delivery things to the millions of households across the country, and they keep track of when the mail carrier cannot delivery something because no one lives in the apartment or house anymore. The address vacancy data tells you the following characteristics at the Census tract level:

  • total number of addresses the USPS knows about
  • number of addresses on urban routes to which the mail carrier hasn’t been able to delivery for 90 days and longer
  • “no-stat” addresses: undeliverable rural addresses, places under construction, urban addresses unlikely to be active

You must register to download the vacant addresses data and be a governmental entity or non-profit organization*, per the agreement** HUD has with USPS. Learn more and download the vacancy data which they update quarterly.

Tina Fassett Smith is a researcher at DePaul University’s Institute of Housing Studies and reviewed part of this blog post. She stresses to readers to ignore the “no-stat” addresses in the USPS’s vacancy dataset. She said that research by her and her colleagues at the IHS concluded this section of the data is unreliable. Tina also said that the methodology mail carriers use to identify vacant addresses and places under change (construction or demolition) isn’t made public and that mail carriers have an incentive to collect the data instead of being compensated normally. Tina further explained the issues with no-stat.

We have seen instances of a relationship between the number of P.O. boxes (i.e., the presence of a post office) and the number of no-stats in an area. This is one reason we took it off of the IHS Data Portal. We have not found it to be a useful data set for better understanding neighborhoods or housing markets.

The Institute of Housing Studies provides vacancy data on their portal for those who don’t want to bother with the HUD sign-up process to obtain it.

* It appears that HUD doesn’t verify your eligibility.

** This agreement also states that one can only use the vacancy data for the “stated purpose”: “measuring and forecasting neighborhood changes, assessing neighborhood needs, and measuring/assessing the various HUD programs in which Users are involved”.