Tagopen data

Finding interesting data in the building permits dataset

I had several great conversations with fellow #chihacknight visitors at the 1871 tech hub (222 W Merchandise Mart Plaza) about how to reveal more information about what’s being built in Chicago. I had introduced Licensed Chicago Contractors at the previous week’s hack night and tonight I showed site changes I made like how much faster it is now that I use DataTables’s server-side processing function.

Some of the discussions resulted in suggestions to try new tools and methods that would make processing the data more efficient, or more revealing. What are the ways I can aggregate the data, or connect to similar data from other sources?

One of the new features I announced I’ll be adding is statistics on building activity by neighborhood. I started testing some queries to see the results, and to find the query that outputs that information in a way that’ll pique users’ interests.

I calculated the aggregate estimated costs of all building permit activity for the past 90 days in select neighborhoods. All of the data was automatically generated using a simple MySQL query, but one that will get faster after switching to Postgres. (I eliminated any project whose estimated cost was less than $1,000 because there are many project types that are $0 to several hundred dollars.)

  • Logan Square: 77 projects, totaling $16,295,997.50 at a $211,636.33 average cost
  • West Loop: 30 projects, totaling $27,646,899.00 at a $921,563.30 average cost
  • Andersonville: 6 projects, totaling $358,770.00 at a $59,795.00 average cost
  • Bronzeville: 34 projects, totaling $17,050,662.00 at a $501,490.06 average cost
  • Hyde Park: 20 projects, totaling $13,492,265.00 at a $674,613.25 average cost
  • Humboldt Park: 35 projects, totaling $41,917,988.00 at a $1,197,656.80 average cost

How does Humboldt Park double the other neighborhoods’ average? I think it’s pretty simple: this $40 million Salvation Army residence that’s going to be built at 825 N Christiana Avenue.

The results for Bronzeville were higher than I expected because this is a distressed neighborhood that has lost of lot of population and has seen little development in the past several years. This isn’t to say the neighborhood is poor – I saw a report last fall that highlighted how the purchasing power of Bronzeville residents was quite high relative to neighboring communities.

Ronnie Harris showed me the report when I participated in the Center for Neighborhood Technology’s civic app competition and hackathon. We, along with Josh Engel, designed Build It! Bronzeville, although my participation was really pushing them to develop Josh’s game idea more and construct a paper version of it. Our team won the competition and Ronnie and Josh have kept working on it (I saw them at last week’s hack night).

Projects that pushed up Bronzeville’s average included several multi-family homes at around $1.4 million each on the blocks of 4700 and 4800 S Calumet Avenue.

Code discussion

I can’t test for the “Loop” right now in the way I have my data structured because a LIKE ‘%loop%’ query of the database will include “West Loop” records.

I need to change how the building permit data is stored – in my database – a little so that my site’s PHP codebase and MySQL queries can sift through the data faster. For example, I’m storing several key-value pairs as a JSON-encoded string in a TEXT field. One #chihacknight developer suggested I switch from MySQL to PostgreSQL because Postgres has native JSON-parsing functions.

I looked up how to use Postgres’s JSON functions and realized that, yes, I probably should do that, but that I also need to change the array structure of the data I’m encoding to JSON. In other words, with a tiny change now, I can be better prepared for the eventual migration to Postgres.

Using open data: Showing what projects licensed Chicago contractors are working on

The New City developer recently received permits for over $50 million of construction work across from the Lincoln Park REI.

The New City developer recently received permits for nearly $50 million of construction work across from the Lincoln Park REI.

I wrote in my last post that I found “pain” in the process of finding a licensed contractor in the city (the pain of finding one who can install in the public way remains unmedicated).

I wanted to provide more than a list (and a map) and EveryBlock has already answered “What’s going on across the street from my house?”. I wanted to add value by helping people answer the question, “What contractor should I choose?”

Several other sites help you do this, like BuildZoom, Angie’s List, and the Better Business Bureau, by showing you customer reviews or complaints. I needed something different from mimicking a review site (a lot of the businesses are also on Yelp) so I decided to answer the question, “What projects have these companies done?”

That’s where the City of Chicago’s open data portal comes in: it has a dataset for Building Permits.

Check out 180 Properties, LLC from Skokie, Illinois. They’ve had two permits issued within the last three months. One project, at 3705 N Hoyne Avenue, is for interior renovation: “Remove/replace cabinets, countertops, flooring, patch & repair drywall”. The estimated cost for the project is $80,000. Sound like the kind of contractor you’re looking for? Call them up or keep researching.

You can even see who else is working on this project. Burnham Nationwide is listed as an expeditor on this project which means they’re likely acting as the intermediary between the Chicago Department of Buildings and the companies actually doing the work. Burnham will do site plans, drawings, occupancy, and ensure everything is in order. The property owner is also listed in the permit information.

For people who want to explore construction activity the other way around, finding projects before contractors, I created a “Permits explorer” page. This page searches the Building Permits dataset to show the most recently issued permits for the most expensive projects. Right now a project to alter and renovate Chicago Vocational High School at 2100 E 87th Street has an estimated cost of $40 million. I didn’t realize how much the Department of Buildings is funded by permits until I saw the permit fees.

The permit fee for the school renovation would have been $372,598 fee but the dataset said the entirety was waived (likely because it’s a Chicago Public School). Other projects I reviewed had permit fees between $30,000 and $75,000.

Real estate speculators, development watchers, and editors of Curbed Chicago should find browsing permits useful. The list includes two projects associated with the New City development at Halsted Street and Clybourn Avenue, across from the Lincoln Park REI store. The two permits are held by 1515 N Halsted, LLC. The first is for a “3 story steel framed mixed-use retail, restuarant, assembly (movie theater) building” at 1500 N Clybourn Avenue (for an estimated cost of $26,403,193), and the second permit describes a 7 story parking garage at 710 W Schiller Street (for $21,518,012).

How it works

I used my programming magic – I prefer PHP – to query the Socrata Open Data API (or SODA) to look for the given contractor’s name in one of eight name fields (there are 16 name fields) and then return information about the most recent permits. The Building Permits dataset gives the project location, work description, and its estimated cost. I figured you could use the project’s estimated cost to gauge the kind of work the contractor does – is the contractor more familiar with big jobs, or little jobs?

This method isn’t the best. Ideally there’d be a relational database where the “Contractor ID” in the licensed contractors dataset would match a “Contractor ID” field in the permit dataset. But the licensed contractors dataset doesn’t have a unique ID field, and isn’t even on the data portal.

Instead, I’m finding contractor-to-project matches by finding the first two or three words of the contractor’s name at the beginning of eight of the 16 name fields in the permit field. SODA works quickly on the query and it passes the results back to PHP in no time.

In the future I’d like to pull in scores and reviews from Yelp and other sites that have APIs (Angies List and Better Business Bureau don’t), as well as try to determine the name of the building – if it has one – by querying OpenStreetMap Nominatim.

Outta left field: I recreated the city’s contractor listing website

The site looks good and works quickly on mobile devices.

LicensedChicagoContractors.com looks good and works quickly on mobile devices.

I’m working on a secret project to get something installed on the public way. The process to find out how to do it is as arduous as getting it done because you never finish learning the process. Every time you think you’ve figured something out, there’s something else.

To get the secret project installed I need a licensed contractor. Not only do a need a licensed contractor, but they must have the license to do work in the public way (versus doing work at your private property).

The Chicago Department of Buildings publishes a continually updated list of licensed contractors on its website but it’s annoying to use. There’s no search, no permanent links, and if you leave the window open long enough this weird session manager kicks in and stops you from browsing to the next page of results.

I asked my followers on Twitter the best way to scrape the data. The ever-amusing Dan O’Neill, who leads the Smart Chicago Collaborative (which hosts the Chicago Crash Browser), recommended just copying and pasting all 10 pages. That would work fine for the first time, but I might need to do it a second time when the data updates. Nick Bennett jumped in and used Selenium, a tool that automates web browsers. He said, “it’s inefficient but for a small job like that I figured why bother with something faster”.

I imported the data into a MySQL table and ran through some of my “standard” data cleaning methods (like trimming leading and trailing spaces, removing odd characters, and extracting good information into other columns, like phone numbers and ZIP codes).

With PHP – my favorite web language – I created a single page website that loads all 3,930 licensed general contractors extremely fast, loads the DataTables JavaScript library to enhance the table with search and sort. I used Bootstrap to make a responsive design meaning it adjusts to fit multiple screen sizes including smartphones and tablets.

I call it LicensedChicagoContractors.com.

The new website still doesn’t solve my problem of finding a company that can do work in the public way – I’m still working on this. The last online dataset I could find is on the city’s old http://egov.cityofchicago.org domain, and was cached by the Internet Archive’s Wayback Machine on January 25, 2010. Ideally this information – plumbers, public way, and general contractors – should be posted on the City’s data portal.

One day left to enter the Divvy Data Challenge

Divvy dock post-polar vortex

Divvy bikes have been covered in snow frequently this winter. Photo by Jennifer Davis.

As self-proclaimed Divvy Data Brigade Captain* in Chicago’s #opendata and #opengov community I must tell you that all Divvy Data Challenge submissions are due tomorrow, Tuesday, March 11. Divvy posted:

Help us illustrate the answers to questions such as: Where are riders going? When are they going there? How far do they ride? What are top stations? What interesting usage patterns emerge? What can the data reveal about how Chicago gets around on Divvy?

We’re interested in infographics, maps, images, animations, or websites that can help answer questions and reveal patterns in Divvy usage. We’re looking for entries to tell us something new about these trips and show us what they look like.

I’ve seen a handful of the entries so far, including some to which I’ve contributed, and I’m impressed. When the deadline passes I’ll feature my favorites.

Want to play with the data? You should start with these resources, in order:

  1. Divvy Data Challenge – rules and data download
  2. divvy-munging – download an enhanced version of Divvy’s data, with input from several #ChiHackNight hackers
  3. Bike Sharing Data Hackpad – this is where I’m consolidating all of the links to projects, visualizations, analysis, data, and blog posts.
  4. Divvy Data Google Group – a discussion group with over 25 members
  5. #DivvyData – chat on Twitter

It’s not too late to get started now on a project about the bikes themselves. Nick Bennet has crunched the numbers on the bikes’ activity and posted them to the Divvy Data Google Group. Want to use his data and initial analysis? He said “run with it”.

Share your work ahead of time and leave a comment with a link to your project.

* This title is a play on Christopher Whitaker’s position as Code For America Brigade Captain and all around awesome-doer of keeping track of everything that’s going on in these communities and publishing event write-ups on Smart Chicago Collaborative.

How Chicagoans commute map: An interview with the cartographer

Chicago Commute Map by Transitized

A screenshot of the map showing Lakeview and the Brown, Red, Purple and Purple Line Express stations.

Shaun Jacobsen blogs at Transitized.com and yesterday published the How Chicagoans Commute map. I emailed him to get some more insight on why he made it, how, and what insights it tells about Chicago and transit. The map color-symbolizes census tracts based on the simple majority commuting transportation mode.

What got you started on it?

It was your post about the Census data and breaking it down by ZIP code to show people how many homes have cars. I’ve used that method a few times. The method of looking up each case each time it came up took too long, so this kind of puts it in one place.

What story did you want to tell?

I wanted to demonstrate that many households in the city don’t have any cars at all, and these residents need to be planned for as well. What I really liked was how the north side transit lines stuck out. Those clearly have an impact on how people commute, but I wonder what the cause is. Are the Red and Brown Lines really good lines (in people’s opinions) so they take them, or are people deciding to live closer to the lines because they want to use it (because they work downtown, for example)?

The reason I decided to post the map on Thursday was because while I was writing the story about a proposed development in Uptown and I wanted  information on how many people had cars around that development. As the map shows, almost all of Uptown is transit-commuting, and a lot of us don’t even own any cars.

What data and tools did you use?

I first used the Chicago Data Portal to grab the census tract boundaries. Then I grabbed all of the census data for B08141 (“means of transportation to work by number of vehicles available”) and DP04 (“selected housing characteristics”) for each tract and combined it using the tract ID and Excel’s VLOOKUP formula.

Read the rest of this interview on Web Map Academy.

Getting a little closer to understanding Chicago’s pothole-filling performance status

Tom Kompare updated his web application that tracks the progress of potholes based on information in the city’s data portal in response to my query about how many potholes the city fills within 72 hours, which is the Chicago Department of Transportation’s performance measure.

He wrote to me via the Open Government Chicago group:

Without completely rewriting http://potholes.311services.org, I added a count of the number of open (not yet addressed) pothole repair tickets (requests) that exceed 3 days old. As of today, the data from the City of Chicago’s Data Portal shows 1,334 or the 1,404 open tickets in the 311 system are older than three days.

Full disclosure: The web app actually looks for greater than 4 days old. The Data Portal’s pothole data are only updated once a day, so these data are always a day old. 4 – 1 = 3.

Keep in mind that this web app only shows how many are yet to be addressed, and does not count how many have been patched within CDOT’s 3-day goal during some arbitrary time period. That is a much more intense calculation that this pure client-side Javascript web application can handle due to bandwidth restrictions on mobile (3/4G). This web app already pushes the mobile envelope with the amount of data downloaded. I can fix that, but, again, not without a rewrite.

Still, 1,334 open repair requests (12/16/2013 Data Portal data) is quite different than the number of open repair requests reported by CDOT (560 in Alley, 193 on street) on 12/16/2013. I’m not sure what is the difference.

This reminds me of a third issue with the way CDOT is presenting pothole performance data online (the first being that it’s PDF, the second that it doesn’t work in Safari). The six PDF files are overwritten for every new day of data. If you want information from two days ago, well you better have downloaded the PDF from two days ago!

CDOT misses the lesson on open data transparency

Publishing the wrong measurement as a PDF isn’t transparency.

The Chicago Department of Transportation released the first progress report to its Chicago Forward Action Agenda in October, two and a half years after the plan – the first of its kind – was published. I’ve spent an inordinate amount of time reading it and putting off a review. Why? It’s been a difficult to compare the original and update documents. The update is extremely light on specifics and details for the many goals in the Action Agenda, which should have organizational (like record keeping and efficiency improvements) and public impacts (like figuring out which intersections have the most crashes). I’ll publish my in-depth review this week.

Aside from missing specifics and details, the update presents information differently and is missing status updates for the three to five “performance measures” in each chapter. It was difficult to understand CDOT’s reporter progress without holding the original and update side-by-side. I think listing the original action item, the progress symbol, and then a status update would have been an easier way to read the document.

The update measures some action items differently than originally called for, and the way pothole repair was presented, a problem for people bicycling and driving, caught my analytical eye.

CDOT states a pothole-filling performance measure of the percentage, which it desires to be increased, “patched or fixed within 72 hours of being reported” but the average, according to the website Chicago Potholes, which tracks the city’s open data, is 101 days*. The update doesn’t necessarily explain why, writing “the 72 hour goal for filling potholes is not always feasible due to asphalt plant schedules” and nothing related to the performance measure.

As originally written, the only way to note the performance would be to list the percentage of potholes filled within the goal time, at the beginning and in the update. This performance measure has a complementary action item – an online dashboard – which could have provided the answer, but didn’t.

CDOT published that dashboard this summer as a series of six PDF files that update daily and you can hardly call it useful.

Publishing PDF files in the day and age of open government data – popular with President Obama and Mayor Rahm Emanuel – is unacceptable. Even if they are accessible – meaning you can copy/paste the text – they are poor outlets for data given the nationally-renowned civic innovation changes that Emanuel has succeeded in establishing.

There’s another problem: the dashboard file for pothole tracking doesn’t track the time it takes to close a pothole request, nor the number of pothole requests that are patched within 72 hours. It simply tells the number completed yesterday, the year to date, and the number of unpatched requests. (I’ve posted the pothole-tracking file to Scribd because the dashboard [PDF] doesn’t work in Safari; I also notified city staff to this problem which they acknowledged over three weeks ago.)

The “Chicago Works For You” website reports a different metric, that of the number of requests made each day, distributed by ward.

I discussed the proposed dashboard with former commissioner Gabe Klein over two years ago. He said he wanted to create a dashboard of projects “we’re working on that’s updated once a week.” Given Klein’s high professional accessibility to myself, John Greenfield and other reporters, I’ll give him and CDOT a pass for not doing this. But Klein also said, “I’m really big on transparency and good communication. When I left [Washington,] D.C. our [Freedom of Information Act Requests] were dramatically lowered.”

I’ll consider the pothole performance measure and action item “in need of major progress.”

* For stats geeks, the median is 86 and standard deviation is ±84.

Converting shapefiles to GeoJSON, and other format conversions

To develop the Chicago Bike Map app, I had a problem I thought would be simple to solve: load train lines into a Leaflet-powered map. I had the train lines stored as a polyline shapefile but Leaflet can only read the GeoJSON format or a string of geographic coordinates representing lines.

I eventually found a solution (I can’t remember how) and I need to share it with you. The converter can do more than ESRI shapefiles to GeoJSON. It can reproject the data in the conversion. It can convert from several formats to several other formats.

The site is called MyGeodata Converter. You upload a ZIP file of geographic files – .shp and its companion files (.prj, .dbf, .shx), .kml, and .gpx. Let’s take the Chicago Transit Authority train lines shapefile straight from the City of Chicago’s open data portal. It downloads as a zipped collection of a shapefile and its buddies and we can take this file straight to the Converter and upload it. The Converter will unzip it and read the data; it will even identify the projection system (for Chicago-based geographic data, its common to use NAD83 Illinois StatePlane East FIPS 1201 Feet (SRID 102671, the same as SRID 3435).

The Converter will convert to one of the following formats, with same or new projection; accepts SQL statements to extract a subset of data:

  • ESRI shapefile
  • GML
  • KML, KMZ
  • GeoJSON
  • Microstation DGN
  • MapInfo File
  • GPX
  • CSV

How to split a bike lane in two and copy features with QGIS

A screenshot of the splash image seen on users with iPad retina displays in landscape mode. 

To make the Chicago Offline Bike Map, I need bikeways data. I got this from the City of Chicago’s data portal, in GIS shapefile format. It has a good attribute table listing the name of the street the bikeway is on and the bikeway’s class (see below). After several bike lanes had been installed, I asked the City’s data portal operators for an updated shapefile. I got it a month later and found that it wasn’t up-to-date. I probably could have received a shapefile with the current bikeway installations marked, but I didn’t have time to wait: every day delayed was one more day I couldn’t promote my app; I make 70 cents per sale.

Since the bikeway lines were already there, I could simply reclassify the sections that had been changed to an upgraded form of bikeway (for example, Wabash Avenue went from a door zone-style bike lane to a buffered bike lane in 2011). I tried to do this but ran into trouble when the line segment was longer than the bikeway segment that needed to be reclassified (for example, Elston Avenue has varying classifications from Milwaukee Avenue to North Avenue that didn’t match the line segments for that street). I had to divide the bikeway into shorter segments and reclassify them individually.

Enter the Split Features tool. QGIS is short on documentation and I had trouble using this feature. I eventually found the trick after a search that took more time than I expected. Here’s how to cut a line:

  1. Select the line using one of the selection tools. I prefer the default one, Select Features, where you have to click on the feature one-by-one. (It’s not required that you select the line, but doing so will ensure you only cut the selected line. If you don’t select the line, you can cut many lines in one go.)
  2. Toggle editing on the layer that contains the line you want to cut.
  3. Click Edit>Split Features to activate that tool, or find its icon in one the toolbars (which may or may not be shown).
  4. Click once near where you want to split the line.
  5. Move the cursor across the line you want to split, in the desired split location.
  6. When the red line indicating your split is where you desire, press the right-click mouse button.

Your line segment has now been split. A new entry has been added to the attribute table. There are now two entries with duplicate attributes representing that together make up the original line segment, before you split it.

This screenshot shows a red line across a road. The red line indicates where the road will be split. Press the right-click mouse button to tell QGIS to “split now”.

After splitting, open the attribute table to see that you now have two features with identical attributes. 

Copying features in QGIS

A second issue I had when creating new bikeways data was when a bikeway didn’t exist and I couldn’t reclassify it. This was the case on Franklin Boulevard: no bikeway had ever been installed there. I solved this problem by copying the relevant street segments from the Transportation (roads) shapefile and pasted them into the bikeways shapefile. New entries were created in the attribute table but with blank attributes. It was simple to fill in the street name, class, and extents.

Chicago bikeways GIS description

Bikeway classes (TYPE in the dataset) in the City of Chicago data portal are:

  1. Existing bike lane
  2. Existing marked shared lane
  3. Proposed on-street bikeway
  4. Recommended bike route
  5. Existing trail
  6. Proposed off-street trail
  7. Access path (to existing trail)
  8. Existing cycle track (also known as protected bike lane)
  9. Existing buffered bike lane

It remains to be seen if the City will identify the “enhanced marked shared lane” on Wells Street between Wacker Drive and Van Buren street differently than “existing marked shared lane” in the data.

Dumke fighting the open data fight for Chicagoans

Dan O’Neil mails a FOIA request to Chicago’s 311 service in 2007. Now, you can email most places (or fax!). 

I like to say that for every dataset a government agency proactively publishes, there’s one fewer FOIA* request it has to respond to.

City officials say they get so many FOIA requests that responding to them all has become a serious resource drain. But this is one of the reasons why—we don’t have any other way to get information about our government.

As a result, I will be adding to their workload and submitting another FOIA request. I don’t mind saying this publicly since it won’t be a secret anyway. That’s because the Emanuel administration has resumed Daley’s old habit of posting FOIA requests online. It’s also kept up Daley’s habit of not posting any information showing how responsive the city is.

That’s Chicago Reader author Mick Dumke talking about his troubles obtaining some data from the Chicago Department of Human Resources. Read the entire article, where he also gives a pretty good description of the “Chicago FOIA way”, the process for getting information in Mayor Emanuel’s transparent administration.

Note: I submit a FOIA request to some agency at least once a month. My most frequent FOIA requests go to the Chicago Transit Authority (CTA) and the Chicago Department of Transportation (CDOT). I also query the Chicago Police Department, and the Department of Administrative Hearings. Derek Eder has a story on how he and his colleagues worked with some Chicago staff to add new data about lobbying to the Chicago Data Portal.

*Freedom of Information Act. In California, it’s called FOIL, or Freedom of Information Law.

© 2017 Steven Can Plan

Theme by Anders NorénUp ↑