Tagopen data

A map of maps

The map of maps.

Over on my website Chicago Cityscape I’ve assembled a map of maps: There are 20,432 maps in 36 layers. You might say there are 36 maps, and each of those maps has an arbitrary number of boundaries within. I say there are 20,000+ maps because there’s a unique webpage for each of them that can tell you even more information about that map.

This post is to throw out some analysis of these maps, in addition to the simple counts above.

The data comes from the City of Chicago, Cook County, and the U.S. Census Bureau. Some layers have come from bespoke sources, including the entrances of CTA and Metra stations drawn by Yonah Freemark and me for Transit Explorer. The sections of the Chicago River were divided and sliced by the Metropolitan Planning Council. The neighborhood and business organizations layers were drawn by me, by interpreting textual descriptions of the organizations’ boundaries, or by visually copying an organization’s own map.

There are 6,879 unique words longer than 2 characters, in the metadata of this map of maps. The most common word is “annexation”, which makes sense, given that the layer with the most maps shows the 10,668 Cook County annexation actions since 1830 – the first known plat was incorporated in the City of Chicago.

The GeoJSON file, an open source, human readable GIS format, comes out to 30 MB, and it make break your browser when you try to display this layer.

The next group of words are also generic, like “planned” and “development”, related to the Planned Development kind of zoning process in Chicago – called Planned Unit Development in other jurisdictions.

After that, some names of municipalities that traded back and forth between unincorporated Cook County and incorporated municipalities are on the list.

Working down the list, however, it gets really boring and I’m going to stop. I bet if you’re a smarter data science person you can find more interesting patterns in the words, but I’ve also increased the number of generic words (like planned development) by adding these as keywords to each map’s “full text search” index, to ensure that they would respond to a variety of search phrases from users.

How to extract highways and subway lines from OpenStreetMap as a shapefile

It’s possible to use Overpass Turbo to extract any object from the OpenStreetMap “planet” and convert it from a GeoJSON or KML file to a shapefile for manipulation and analysis in GIS.

Say you want the subway lines for Mexico City, and you can’t find a GTFS file that you could convert to shapefile, and you can’t find the right files on Sistema de Transporte Colectivo’s website (I didn’t look for it).

Here’s how to extract the subway lines that are shown in OpenStreetMap and save them as a GIS shapefile.

This is my second tutorial to describe using Overpass Turbo. The first extracted places of worship in Cook County. I’ve also used Overpass Turbo to extract a map of campgrounds

Extract free and open source data from OpenStreetMap

  1. Open the Overpass Turbo website and, on the map, search for the city from which you want to extract data. (The Overpass query will be generated in such a way that it’ll only search for data in the current map view.)
  2. Click the “Wizard” button in the top toolbar. (Alternatively you can copy the code below and paste it into the text area on the website and click the “Run” button.)
  3. In the Wizard dialog box, type in “railway=subway” in order to find metro, subway, or rapid transit lines. (If you want to download interstate highways, or what they call motorways in the UK, use “highway=motorway“.) Then click the “build and run query” button.
  4. In a few seconds you’ll see lines and dots (representing the metro or subway stations) on the map, and a new query in the text area. Notice that the query has looked for three kinds of objects: node (points/stations), way (the subway tracks), relation (the subway routes).
  5. If you don’t want a particular kind of object, then delete its line from the query and click the “Run” button. (You probably don’t want relation if you’re just needing GIS data for mapping purposes, and because routes are not always well-defined by OpenStreetMap contributors.)
  6. Download the data by clicking the “Export” button. Choose from one of the first three options (GeoJSON, GPX, KML). If you’re going to use a desktop GIS software, or place this data in a web map (like Leaflet), then choose GeoJSON. Now, depending on what browser you’re using, a couple things could happen after you click on GeoJSON. If you’re using Chrome then clicking it will download a file. If you’re using Safari then clicking it will open a new tab and put the GeoJSON text in there. Copy and paste this text into TextEdit and save the file as “mexico_city_subway.geojson”.
Overpass Turbo screenshot 1 of 2

Screenshot 1: After searching for the city for which you want to extract data (Mexico City in this case), click the “Wizard” button and type “railway=subway” and click run.

Overpass Turbo screenshot 2

Screenshot 2: After building and running the query from the Wizard you’ll see subway lines and stations.

Overpass Turbo screenshot 3

Screenshot 3: Click the Export button and click GeoJSON. In Chrome, a file will download. In Safari, a new tab with the GeoJSON text will open (copy and paste this into TextEdit and save it as “mexico_city_subway.geojson”).

Convert the free and open source data into a shapefile

  1. After you’ve downloaded (via Chrome) or re-saved (Safari) a GeoJSON file of subway data from OpenStreetMap, open QGIS, the free and open source GIS desktop application for Linux, Windows, and Mac.
  2. In QGIS, add the GeoJSON file to the table of contents by either dragging the file in from the Finder (Mac) or Explorer (Windows), or by clicking File>Open and browsing and selecting the file.
  3. Convert it to GeoJSON by right-clicking on the layer in the table of contents and clicking “Save As…”
  4. In the “Save As…” dialog box choose “ESRI Shapefile” from the dropdown menu. Then click “Browse” to find a place to save this file, check “Add saved file to map”, and click the “OK” button.
  5. A new layer will appear in your table of contents. In the map this new layer will be layered directly above your GeoJSON data.
Overpass Turbo screenshot 4

Screenshot 4: The GeoJSON file exported from Overpass Turbo has now been loaded into the QGIS table of contents.

Overpass Turbo screenshot 5

Screenshot 5: In QGIS, right-click the layer, select “Save As…” and set the dialog box to have these settings before clicking OK.

Query for finding subways in your current Overpass Turbo map view

This has been generated by the overpass-turbo wizard.
The original search was:
// gather results
// query part for: “railway=subway”
/*relation is for "routes", which are not always
well-defined, so I would ignore it*/
// print results
out body;
out skel qt;

How to use Chicago Cityscape’s upgraded names search tool

Search for names of people who do business in Chicago.

I created a combined dataset of over 2 million names, including contractors, architects, business names, and business owners and their shareholders, from Chicago’s open data portal, and property owners/managers from the property tax database. It’s one of three new features published in the last couple of weeks.

Type a person or company name in the search bar and press “search”. In less than 1 second you’ll get results and a hint as to what kind of records we have.

What should you search?

Take any news article about a Chicago kinda situation, like this recent Chicago Sun-Times article about the city using $8 million in taxpayer-provided TIF district money to move the Harriet Rees house one block. The move made way for a taxpayer-funded property acquisition on which the DePaul/McCormick Place stadium will be built.

The CST is making the point that something about the house’s sale and movement is sketchy (although I don’t know if they showed that anything illegal happened).

There’re a lot of names in the article, but here are some of the ones we can find info about in Chicago Cityscape.

Salvatore Martorina – an architect & building permit expeditor, although this name is connected to a lot of other names on the business licenses section of Cityscape

Oscar Tatosian – rug company owner, who owned the vacant lot to which the Rees house was moved

Bulley & Andrews – construction company which moved the house

There were no records for the one attorney and two law firms mentioned.

Who are the top property owners in Cook County

235 West Van Buren Street

There are several hundred condo units in the building at 235 W Van Buren Street, and each unit is associated with multiple Property Index Numbers (PIN). Photo by Jeff Zoline.

Several people have used Chicago Cityscape to try and find who owns a property. Since I’ve got property tax data for 2,013,563 individually billed pieces of property in Cook County I can help them research that answer.

The problem, though, is that the data, from the Cook County combined property tax  website, only shows who receives the property tax bills – the recipient – who isn’t always the property’s owner.

The combined website is a great tool. Property value info comes from the Assessor’s office. Sales data comes from the Recorder of Deeds, which is another, separately elected, Cook County government agency. Finally, the Treasurer’s office, a third agency, also with a separately elected leader, sends the bills and collects the tax.

The following is a list of the top 100 (or so) “property tax bill recipients” in Cook County for the tax years 2010 to 2014, ranked by the number of associated Property Index Numbers.

Many PINs have changed recipients after being sold or divided, and the data only lists the recipient at its final tax year. A tax bill for Unit 1401 at 235 W Van Buren St was at one time sent to “235 VAN BUREN, CORP” (along with 934 other bills), but in 2011 the PIN was divided after the condo unit was sold.

Of the 100 names, DataMade’s new “probablepeople” name parsing Python script identified 13 as persons. It mistakenly identified eight names as “Person”, leaving five people in the top 100.

The actual number is closer to 90, arrived at by combining 5 names that seem to be the same (using OpenRefine’s clustering function) and removing 5 “to the current taxpayer” and empty names. You’ll notice “Altus” listed four times (they’re based in Phoenix) and Chicago Title Land Trust, which can help property owners remain private, listed twice (associated with 643 PINs).

[table id=2 /]

Working with ZIP code data (and alternatives to using sketchy ZIP code data)

1711 North Kimball Avenue, built 1890

This building at 1711 N Kimball no longer receives mail and the local mail carrier would mark it as vacant. After a minimum length of time the address will appear in the United States Postal Service’s vacancy dataset, provided by the federal Department of Housing and Urban Development. Photo: Gabriel X. Michael.

Working with accurate ZIP code data in your geographic publication (website or report) or demographic analysis can be problematic. The most accurate dataset – perhaps the only one that could be called reliably accurate – is one that you purchase from one of the United States Postal Service’s (USPS) authorized resellers. If you want to skip the introduction on what ZIP codes really represent, jump to “ZIP-code related datasets”.

Understanding what ZIP codes are

In other words the post office’s ZIP code data, which they use to deliver mail and not to locate people like your publication or analysis, is not free. It is also, unbeknownst to many, a dataset that lists mail carrier routes. It’s not a boundary or polygon, although many of the authorized resellers transform it into a boundary so buyers can geocode the location of their customers (retail companies might use this for customer tracking and profiling, and petition-creating websites for determining your elected officials).

The Census Bureau has its own issues using ZIP code data. For one, the ZIP code data changes as routes change and as delivery points change. Census boundaries needs to stay somewhat constant to be able to compare geographies over time, and Census tracts stay the same for a period of 10 years (between the decennial surveys).

Understanding that ZIP codes are well known (everybody has one and everybody knows theirs) and that it would be useful to present data on that level, the Bureau created “ZIP Code Tabulation Areas” (ZCTA) for the 2000 Census. They’re a collection of Census tracts that resemble a ZIP code’s area (they also often share the same 5-digit identifiers). The ZCTA and an area representing a ZIP code have a lot of overlap and can share much of the same space. ZCTA data is freely downloadable from the Census Bureau’s TIGER shapefiles website.

There’s a good discussion about what ZIP codes are and aren’t on the GIS StackExchange.

Chicago example of the problem

Here’s a real world example of the kinds of problems that ZIP code data availability and comprehension: Those working on the Chicago Health Atlas have run into this problem where they were using two different datasets: ZCTA from the Census Bureau and ZIP codes as prepared by the City of Chicago and published on their open data portal. Their solution, which is really a stopgap measure and needs further review not just by those involved in the app but by a diverse group of data experts, was to add a disclaimer that they use ZCTAs instead of the USPS’s ZIP code data.

ZIP-code related datasets

Fast forward to why I’m telling you all of this: The U.S. Department of Housing and Urban Development (HUD) has two ZIP-code based datasets that may prove useful to mappers and researchers.

1. ZIP code crosswalk files

This is a collection of eight datasets that link a level of Census geography to ZIP codes (and the reverse). The most useful to me is ZIP to Census tract. This dataset tells you in which ZIP code a Census tract lies (including if it spans multiple ZIP codes). HUD is using data from the USPS to create this.

The dataset is documented well on their website and updated quarterly, going back to 2010. The most recent file comes as a 12 MB Excel spreadsheet.

2. Vacant addresses

The USPS employs thousands of mail carriers to delivery things to the millions of households across the country, and they keep track of when the mail carrier cannot delivery something because no one lives in the apartment or house anymore. The address vacancy data tells you the following characteristics at the Census tract level:

  • total number of addresses the USPS knows about
  • number of addresses on urban routes to which the mail carrier hasn’t been able to delivery for 90 days and longer
  • “no-stat” addresses: undeliverable rural addresses, places under construction, urban addresses unlikely to be active

You must register to download the vacant addresses data and be a governmental entity or non-profit organization*, per the agreement** HUD has with USPS. Learn more and download the vacancy data which they update quarterly.

Tina Fassett Smith is a researcher at DePaul University’s Institute of Housing Studies and reviewed part of this blog post. She stresses to readers to ignore the “no-stat” addresses in the USPS’s vacancy dataset. She said that research by her and her colleagues at the IHS concluded this section of the data is unreliable. Tina also said that the methodology mail carriers use to identify vacant addresses and places under change (construction or demolition) isn’t made public and that mail carriers have an incentive to collect the data instead of being compensated normally. Tina further explained the issues with no-stat.

We have seen instances of a relationship between the number of P.O. boxes (i.e., the presence of a post office) and the number of no-stats in an area. This is one reason we took it off of the IHS Data Portal. We have not found it to be a useful data set for better understanding neighborhoods or housing markets.

The Institute of Housing Studies provides vacancy data on their portal for those who don’t want to bother with the HUD sign-up process to obtain it.

* It appears that HUD doesn’t verify your eligibility.

** This agreement also states that one can only use the vacancy data for the “stated purpose”: “measuring and forecasting neighborhood changes, assessing neighborhood needs, and measuring/assessing the various HUD programs in which Users are involved”.

How to ascertain the area of Chicago beach parking lots to find the largest one

This tutorial is a direct response to a question about which Chicago beach has the largest parking lot. Matt Nardella of Moss Design, in a response to a Twitter-based conversation about Alderman Cappleman’s suggestion that perhaps Montrose beach has too much parking, researched on Wikipedia to find the answer. This is where it said that Montrose beach has the largest parking lot of any of Chicago’s 27 beaches.

Now we’re going to try and prove which beach has the largest associated parking lot.

This tutorial will teach how you to (1) display Chicago beaches, (2) download data held in OpenStreetMap, (3) find the parking lots within the OpenStreetMap data, (4) find the parking lots near the beaches, and (5) calculate each parking lot’s area (in square feet). You can use this tutorial to accomplish any one of these three tasks, or the same tasks but on a different part of OpenStreetMap data (like the area of indoor shopping malls).

You’ll need the QGIS software before starting. You’ll also need at least 500 MB of free space. Start a project folder called “Biggest Parking Lots in Chicago” and make two more folders, within this folder, called “origdata” and “data”.

First, let’s get some data about beaches

Since we only want to know about the parking lots near Chicago beaches we need to get a dataset that locates them. This data is presumably within the same OpenStreetMap extract we’re waiting for, but it’s best to go to the most reliable source.

  1. Download the Parks – Facilities & Features shapefile from the City of Chicago open data portal. I’ve already verified that it has all the beaches (as points).
  2. Open the parks shapefile in a new document in QGIS (call it “map01a.qgs”). You might not see the data so right-click the parks layer and select “Zoom to layer extent”.
  3. Filter out all the points that aren’t beaches by using the query builder. Right-click the layer and select “Filter…” and input this filter expression: “FACILITY_N” = ‘BEACH’
  4. Your map will now show 26 points along an invisible lakefront and then the beach at Humboldt Park.
  5. For the rest of this tutorial we’ll reference the beaches layer as ParkFacilities.

Second, let’s get some data from OpenStreetMap

The easiest way to grab data from OpenStreetMap is by using QGIS, a free, open source desktop GIS application that has myriad plugins that match the capabilities of the heavyweight ESRI ArcGIS line of software. We can download OpenStreetMap data straight into QGIS.

  1. Click on the Vector menu and select OpenStreetMap>Download data.
  2. We want as much data as will cover the beaches information so in the Extent section of the dialog box choose “From layer” and select the beaches layer (called ParkFacilities).
  3. Browse to the “origdata” folder you created in the first task and choose the filename “chicago.osm”.
  4. Click OK and watch the progress meter tell you how much data you’ve downloaded from OpenStreetMap.
  5. Once it’s completed downloading, click “Close”. Now we want to add this data to our map.
  6. Drag the chicago.osm file from your file system into the QGIS Layers list. A dialog box will appear asking which layers you want to add.
  7. Select the layer that has the type “MultiPolygon”. This represents areas like buildings and parking lots.

Third, display the OpenStreetMap data and eliminate everything but the parking lots

We only want to compare parking lots in this dataset with beaches in the previous dataset so we need to eliminate everything from the OpenStreetMap data that’s not a parking lot. Since OSM data depends on tags we can easily select and show all the objects where “amenity” = “parking”.

  1. Filter out all the polygons that aren’t parking lots by using the query builder. Right-click the layer and select “Filter…” and input this filter expression: “amenity” = ‘parking’. Hopefully all the parking lots have been drawn so we can analyze a complete dataset!
  2. Your map will now show little squares, rectangles, and myriad odd shapes that represent parking lots around Chicagoland. (Most of these have been drawn by hand.) It should look like Image XXX.
  3. Since this data is stored in a projection with the codename of EPSG:3435 and the OpenStreetMap data is stored with codename of EPSG:4326 we need to convert the beaches to match the beaches (because we’re going to be using feet as a  measuring distance instead of degrees).
  4. Right-click the layer and select “Save As…” and choose the format “ESRI Shapefile”. Then click the top Browse button and select a location on your hard drive for the converted file.
  5. For “CRS” choose “Selected CRS”. Then click the bottom Browse button and search for the EPSG with the codename 3435. Select the checkbox named “Add saved file to map” so the new layer will be immediately added to our map.

Fourth, select all the parking lots near a beach

This task will select all the parking lots near the beaches. I chose 2,000 feet but you could easily choose a different distance. You might want to measure on Google Earth some minimum and maximum distances between beaches and their respective, associated parking lots.

(This task is easier using PostGIS which has a ST_DWithin function to find objects within a certain distance because we can avoid having to create the buffer in QGIS.)

  1. Create a 2,000 feet buffer. Select Vector>Geoprocessing tools>Buffer.
  2. In the Buffer(s) dialog box, select ParkFacilities (which has your beaches) as the “Input vector layer”. Choose a distance of 2000 (the units are pre-chosen by the projection and since we’re using a projection that’s in feet, the distance unit will be feet).
  3. Browse to your project folder’s “data” folder and save the “Output shapefile” as “beaches buffer 2000ft.shp”.
  4. Click “Add results to canvas” and then click OK.
  5. Double check that 2,000 feet was enough to select the parking lots. In my case, I see that the point representing Montrose beach was further than 2,000 feet away from a parking lot.
  6. Let’s do it again but with 3,000 feet this time, and saving the “Output shapefile” as “beaches buffer 3000ft.shp”.
  7. This time it worked and the nearest parking lots are now in the 3,000 feet radius buffer. You can see in Image XXX how the two concentric circles stretch out from the beach point towards the parking lots.

We’re not done. We’re next going to use our newly created 3,000 feet buffers to tell us which parking lots are in them. These will be presumed to be our beach parking lots.

  1. Use the “Select by location” tool to find the beaches that intersect our 3,000 feet buffers. Select Vector>Research Tools>Select by location.
  2. Follow me: we want to select features in parking 3435 [our parking lots] that intersect features in beaches buffer 3000ft [our beaches]. We’ll modify the current selection by creating a new selection so that we don’t accidentally include any features previously selected.
  3. You’ll now see a bunch of parking lots turn yellow meaning they are actively selected.
  4. Let’s save our selected parking lots as a new file so it will be easier to analyze just them. Right-click “parking 3435” and select “Save Selection As…” (it’s important to choose “Save Selection As” instead of “Save As” because the former will save just the parking lots we’ve selected).
  5. Save it as “selected parking 3435.shp” in your “data” folder. The CRS should be EPSG:3435 (NAD83 Illinois StatePlane East Feet). Check off “Add saved file to map” and click OK.
  6. Turn off all other layers except ParkFacilities to see what we’re left with and you’ll see what I show in Image XXX.

Fifth, let’s calculate

Calculating the area is probably the easiest part of this tutorial.

  1. Close all attribute tables you may have opened.
  2. Select Vector>Geometry Tools>Export/Add geometry columns and choose “selected parking 3435” as your input vector layer.
  3. Leave all other options as-is and press OK. When told about how QGIS can’t access something simultaneously, choose “Yes”.
  4. QGIS should have told you that “selected parking 3435” has been updated. Right-click the layer and choose “Open Attribute Table”.
  5. Scroll to the far right and you’ll see a new column called AREA. This represents the parking lot’s area in square feet.
  6. Click on the AREA column heading to sort it from smallest to largest. Scroll to the bottom of the list and you’ll find the parking lot with the largest area. Double check – is it near a beach?


With my analysis, and with the data available from OpenStreetMap when I created this tutorial, there are three abnormally large parking lots:

  1. A linear lot near the Lincoln Park Zoo and North Avenue beach (6.8 acres)
  2. A curving lot near Montrose Beach (4.75 acres)
  3. An irregularly shaped lot near Montrose Beach (4.5 acres)

There’s one major caveat in this analysis and that’s the missing parking lots on beaches south of Navy Pier. This means that no one has drawn them into OpenStreetMap so it’s time to start editing!

Chicago wards with the most landmarked places

Montgomery Ward Complex

People float by the Montgomery Ward Complex on Kayaks. Photo by Michelle Anderson.

Last week I met with the passionate staff at Landmarks Illinois to talk about Licensed Chicago Contractors. I wanted to understand the legality for historic preservation and determine ways to highlight landmarked structures on the website and track any modifications or demolitions to them.

I incorporated two new geographies over the weekend: Chicago landmark districts, and properties and areas on the National Register of Historic Places (both available on the City of Chicago open data portal).

I used pgShapeLoader to import them to my DigitalOcean-hosted PostgreSQL database and modified some existing code to start looking at these two new datasets. Voila, you can now track what’s going on in the Montgomery Ward Company Complex – currently occupied by “600 W” (at 600 W Chicago Avenue) hosting Groupon among other businesses and restaurants.

Today I was messing around with some queries after I saw that the ward containing this place on the National Register – the 27th – also had a bunch of other listed spots.

I wrote a query to see which wards have the most places on the National Register. The table below lists the top three wards, with links to their page on Licensed Chicago Contractors. You’ll find that many have no building permits associated with them. This is because of two reasons: the listing’s small geography to look within for permits may not include the geography of issued permits (they’re a few feet off); we don’t have a copy of all permits yet.

[table id=15 /]

4 wards don’t have any listings on the National Register of Historic Places and nine wards have one listing.

Top 20 most active general contractors in Chicago this year so far

River Point tower is under construction at 444 W Lake Street by James McHugh Construction. Its associated building permit, which only comprises the base/train cover structure depicted, has an estimated project cost of $27,050,000. Photo by Bart Shore.

Here I go again not talking about urban planning on Steven Can Plan. With my Licensed Chicago Contractors & Construction Activity website I’ve ingested a lot of data from the City of Chicago’s open data portal that has a LOT to say about what’s going on.

Nearly all permits include a reasonable value that estimates the project’s cost – like $13,150,000.00 for converting Lafayette Elementary School into a performing arts high school. (All demolition permits show $1 and some permits cost over $6 billion, which we know is false.)

I recently reorganized the data (migrating from MySQL to PostgreSQL which supports the JSON datatype) to expand the ways I can extract info and I’ve sorted it by general contractor’s aggregate project value for this year.

This is a list of the 20 most active general contractors in Chicago for 2014 until May 13, 2014. I define “activity” purely by the companies’ involvement in a Chicago-based project and corresponds to that project’s estimated cost. (The data doesn’t specify a level of their involvement in a project.)

[table id=11 /]

Do any names ring a bell, or surprise you?

Finding interesting data in the building permits dataset

I had several great conversations with fellow #chihacknight visitors at the 1871 tech hub (222 W Merchandise Mart Plaza) about how to reveal more information about what’s being built in Chicago. I had introduced Licensed Chicago Contractors at the previous week’s hack night and tonight I showed site changes I made like how much faster it is now that I use DataTables’s server-side processing function.

Some of the discussions resulted in suggestions to try new tools and methods that would make processing the data more efficient, or more revealing. What are the ways I can aggregate the data, or connect to similar data from other sources?

One of the new features I announced I’ll be adding is statistics on building activity by neighborhood. I started testing some queries to see the results, and to find the query that outputs that information in a way that’ll pique users’ interests.

I calculated the aggregate estimated costs of all building permit activity for the past 90 days in select neighborhoods. All of the data was automatically generated using a simple MySQL query, but one that will get faster after switching to Postgres. (I eliminated any project whose estimated cost was less than $1,000 because there are many project types that are $0 to several hundred dollars.)

  • Logan Square: 77 projects, totaling $16,295,997.50 at a $211,636.33 average cost
  • West Loop: 30 projects, totaling $27,646,899.00 at a $921,563.30 average cost
  • Andersonville: 6 projects, totaling $358,770.00 at a $59,795.00 average cost
  • Bronzeville: 34 projects, totaling $17,050,662.00 at a $501,490.06 average cost
  • Hyde Park: 20 projects, totaling $13,492,265.00 at a $674,613.25 average cost
  • Humboldt Park: 35 projects, totaling $41,917,988.00 at a $1,197,656.80 average cost

How does Humboldt Park double the other neighborhoods’ average? I think it’s pretty simple: this $40 million Salvation Army residence that’s going to be built at 825 N Christiana Avenue.

The results for Bronzeville were higher than I expected because this is a distressed neighborhood that has lost of lot of population and has seen little development in the past several years. This isn’t to say the neighborhood is poor – I saw a report last fall that highlighted how the purchasing power of Bronzeville residents was quite high relative to neighboring communities.

Ronnie Harris showed me the report when I participated in the Center for Neighborhood Technology’s civic app competition and hackathon. We, along with Josh Engel, designed Build It! Bronzeville, although my participation was really pushing them to develop Josh’s game idea more and construct a paper version of it. Our team won the competition and Ronnie and Josh have kept working on it (I saw them at last week’s hack night).

Projects that pushed up Bronzeville’s average included several multi-family homes at around $1.4 million each on the blocks of 4700 and 4800 S Calumet Avenue.

Code discussion

I can’t test for the “Loop” right now in the way I have my data structured because a LIKE ‘%loop%’ query of the database will include “West Loop” records.

I need to change how the building permit data is stored – in my database – a little so that my site’s PHP codebase and MySQL queries can sift through the data faster. For example, I’m storing several key-value pairs as a JSON-encoded string in a TEXT field. One #chihacknight developer suggested I switch from MySQL to PostgreSQL because Postgres has native JSON-parsing functions.

I looked up how to use Postgres’s JSON functions and realized that, yes, I probably should do that, but that I also need to change the array structure of the data I’m encoding to JSON. In other words, with a tiny change now, I can be better prepared for the eventual migration to Postgres.

Using open data: Showing what projects licensed Chicago contractors are working on

The New City developer recently received permits for over $50 million of construction work across from the Lincoln Park REI.

The New City developer recently received permits for nearly $50 million of construction work across from the Lincoln Park REI.

I wrote in my last post that I found “pain” in the process of finding a licensed contractor in the city (the pain of finding one who can install in the public way remains unmedicated).

I wanted to provide more than a list (and a map) and EveryBlock has already answered “What’s going on across the street from my house?”. I wanted to add value by helping people answer the question, “What contractor should I choose?”

Several other sites help you do this, like BuildZoom, Angie’s List, and the Better Business Bureau, by showing you customer reviews or complaints. I needed something different from mimicking a review site (a lot of the businesses are also on Yelp) so I decided to answer the question, “What projects have these companies done?”

That’s where the City of Chicago’s open data portal comes in: it has a dataset for Building Permits.

Check out 180 Properties, LLC from Skokie, Illinois. They’ve had two permits issued within the last three months. One project, at 3705 N Hoyne Avenue, is for interior renovation: “Remove/replace cabinets, countertops, flooring, patch & repair drywall”. The estimated cost for the project is $80,000. Sound like the kind of contractor you’re looking for? Call them up or keep researching.

You can even see who else is working on this project. Burnham Nationwide is listed as an expeditor on this project which means they’re likely acting as the intermediary between the Chicago Department of Buildings and the companies actually doing the work. Burnham will do site plans, drawings, occupancy, and ensure everything is in order. The property owner is also listed in the permit information.

For people who want to explore construction activity the other way around, finding projects before contractors, I created a “Permits explorer” page. This page searches the Building Permits dataset to show the most recently issued permits for the most expensive projects. Right now a project to alter and renovate Chicago Vocational High School at 2100 E 87th Street has an estimated cost of $40 million. I didn’t realize how much the Department of Buildings is funded by permits until I saw the permit fees.

The permit fee for the school renovation would have been $372,598 fee but the dataset said the entirety was waived (likely because it’s a Chicago Public School). Other projects I reviewed had permit fees between $30,000 and $75,000.

Real estate speculators, development watchers, and editors of Curbed Chicago should find browsing permits useful. The list includes two projects associated with the New City development at Halsted Street and Clybourn Avenue, across from the Lincoln Park REI store. The two permits are held by 1515 N Halsted, LLC. The first is for a “3 story steel framed mixed-use retail, restuarant, assembly (movie theater) building” at 1500 N Clybourn Avenue (for an estimated cost of $26,403,193), and the second permit describes a 7 story parking garage at 710 W Schiller Street (for $21,518,012).

How it works

I used my programming magic – I prefer PHP – to query the Socrata Open Data API (or SODA) to look for the given contractor’s name in one of eight name fields (there are 16 name fields) and then return information about the most recent permits. The Building Permits dataset gives the project location, work description, and its estimated cost. I figured you could use the project’s estimated cost to gauge the kind of work the contractor does – is the contractor more familiar with big jobs, or little jobs?

This method isn’t the best. Ideally there’d be a relational database where the “Contractor ID” in the licensed contractors dataset would match a “Contractor ID” field in the permit dataset. But the licensed contractors dataset doesn’t have a unique ID field, and isn’t even on the data portal.

Instead, I’m finding contractor-to-project matches by finding the first two or three words of the contractor’s name at the beginning of eight of the 16 name fields in the permit field. SODA works quickly on the query and it passes the results back to PHP in no time.

In the future I’d like to pull in scores and reviews from Yelp and other sites that have APIs (Angies List and Better Business Bureau don’t), as well as try to determine the name of the building – if it has one – by querying OpenStreetMap Nominatim.

© 2017 Steven Can Plan

Theme by Anders NorénUp ↑