Tag: open data

Outta left field: I recreated the city’s contractor listing website

The site looks good and works quickly on mobile devices.

LicensedChicagoContractors.com looks good and works quickly on mobile devices.

I’m working on a secret project to get something installed on the public way. The process to find out how to do it is as arduous as getting it done because you never finish learning the process. Every time you think you’ve figured something out, there’s something else.

To get the secret project installed I need a licensed contractor. Not only do a need a licensed contractor, but they must have the license to do work in the public way (versus doing work at your private property).

The Chicago Department of Buildings publishes a continually updated list of licensed contractors on its website but it’s annoying to use. There’s no search, no permanent links, and if you leave the window open long enough this weird session manager kicks in and stops you from browsing to the next page of results.

I asked my followers on Twitter the best way to scrape the data. The ever-amusing Dan O’Neill, who leads the Smart Chicago Collaborative (which hosts the Chicago Crash Browser), recommended just copying and pasting all 10 pages. That would work fine for the first time, but I might need to do it a second time when the data updates. Nick Bennett jumped in and used Selenium, a tool that automates web browsers. He said, “it’s inefficient but for a small job like that I figured why bother with something faster”.

I imported the data into a MySQL table and ran through some of my “standard” data cleaning methods (like trimming leading and trailing spaces, removing odd characters, and extracting good information into other columns, like phone numbers and ZIP codes).

With PHP – my favorite web language – I created a single page website that loads all 3,930 licensed general contractors extremely fast, loads the DataTables JavaScript library to enhance the table with search and sort. I used Bootstrap to make a responsive design meaning it adjusts to fit multiple screen sizes including smartphones and tablets.

I call it LicensedChicagoContractors.com.

The new website still doesn’t solve my problem of finding a company that can do work in the public way – I’m still working on this. The last online dataset I could find is on the city’s old http://egov.cityofchicago.org domain, and was cached by the Internet Archive’s Wayback Machine on January 25, 2010. Ideally this information – plumbers, public way, and general contractors – should be posted on the City’s data portal.

One day left to enter the Divvy Data Challenge

Divvy dock post-polar vortex

Divvy bikes have been covered in snow frequently this winter. Photo by Jennifer Davis.

As self-proclaimed Divvy Data Brigade Captain* in Chicago’s #opendata and #opengov community I must tell you that all Divvy Data Challenge submissions are due tomorrow, Tuesday, March 11. Divvy posted:

Help us illustrate the answers to questions such as: Where are riders going? When are they going there? How far do they ride? What are top stations? What interesting usage patterns emerge? What can the data reveal about how Chicago gets around on Divvy?

We’re interested in infographics, maps, images, animations, or websites that can help answer questions and reveal patterns in Divvy usage. We’re looking for entries to tell us something new about these trips and show us what they look like.

I’ve seen a handful of the entries so far, including some to which I’ve contributed, and I’m impressed. When the deadline passes I’ll feature my favorites.

Want to play with the data? You should start with these resources, in order:

  1. Divvy Data Challenge – rules and data download
  2. divvy-munging – download an enhanced version of Divvy’s data, with input from several #ChiHackNight hackers
  3. Bike Sharing Data Hackpad – this is where I’m consolidating all of the links to projects, visualizations, analysis, data, and blog posts.
  4. Divvy Data Google Group – a discussion group with over 25 members
  5. #DivvyData – chat on Twitter

It’s not too late to get started now on a project about the bikes themselves. Nick Bennet has crunched the numbers on the bikes’ activity and posted them to the Divvy Data Google Group. Want to use his data and initial analysis? He said “run with it”.

Share your work ahead of time and leave a comment with a link to your project.

* This title is a play on Christopher Whitaker’s position as Code For America Brigade Captain and all around awesome-doer of keeping track of everything that’s going on in these communities and publishing event write-ups on Smart Chicago Collaborative.

How Chicagoans commute map: An interview with the cartographer

Chicago Commute Map by Transitized

A screenshot of the map showing Lakeview and the Brown, Red, Purple and Purple Line Express stations.

Shaun Jacobsen blogs at Transitized.com and yesterday published the How Chicagoans Commute map. I emailed him to get some more insight on why he made it, how, and what insights it tells about Chicago and transit. The map color-symbolizes census tracts based on the simple majority commuting transportation mode.

What got you started on it?

It was your post about the Census data and breaking it down by ZIP code to show people how many homes have cars. I’ve used that method a few times. The method of looking up each case each time it came up took too long, so this kind of puts it in one place.

What story did you want to tell?

I wanted to demonstrate that many households in the city don’t have any cars at all, and these residents need to be planned for as well. What I really liked was how the north side transit lines stuck out. Those clearly have an impact on how people commute, but I wonder what the cause is. Are the Red and Brown Lines really good lines (in people’s opinions) so they take them, or are people deciding to live closer to the lines because they want to use it (because they work downtown, for example)?

The reason I decided to post the map on Thursday was because while I was writing the story about a proposed development in Uptown and I wanted  information on how many people had cars around that development. As the map shows, almost all of Uptown is transit-commuting, and a lot of us don’t even own any cars.

What data and tools did you use?

I first used the Chicago Data Portal to grab the census tract boundaries. Then I grabbed all of the census data for B08141 (“means of transportation to work by number of vehicles available”) and DP04 (“selected housing characteristics”) for each tract and combined it using the tract ID and Excel’s VLOOKUP formula.

Read the rest of this interview on Web Map Academy.

Getting a little closer to understanding Chicago’s pothole-filling performance status

Tom Kompare updated his web application that tracks the progress of potholes based on information in the city’s data portal in response to my query about how many potholes the city fills within 72 hours, which is the Chicago Department of Transportation’s performance measure.

He wrote to me via the Open Government Chicago group:

Without completely rewriting http://potholes.311services.org, I added a count of the number of open (not yet addressed) pothole repair tickets (requests) that exceed 3 days old. As of today, the data from the City of Chicago’s Data Portal shows 1,334 or the 1,404 open tickets in the 311 system are older than three days.

Full disclosure: The web app actually looks for greater than 4 days old. The Data Portal’s pothole data are only updated once a day, so these data are always a day old. 4 – 1 = 3.

Keep in mind that this web app only shows how many are yet to be addressed, and does not count how many have been patched within CDOT’s 3-day goal during some arbitrary time period. That is a much more intense calculation that this pure client-side Javascript web application can handle due to bandwidth restrictions on mobile (3/4G). This web app already pushes the mobile envelope with the amount of data downloaded. I can fix that, but, again, not without a rewrite.

Still, 1,334 open repair requests (12/16/2013 Data Portal data) is quite different than the number of open repair requests reported by CDOT (560 in Alley, 193 on street) on 12/16/2013. I’m not sure what is the difference.

This reminds me of a third issue with the way CDOT is presenting pothole performance data online (the first being that it’s PDF, the second that it doesn’t work in Safari). The six PDF files are overwritten for every new day of data. If you want information from two days ago, well you better have downloaded the PDF from two days ago!

CDOT misses the lesson on open data transparency

Publishing the wrong measurement as a PDF isn’t transparency.

The Chicago Department of Transportation released the first progress report to its Chicago Forward Action Agenda in October, two and a half years after the plan – the first of its kind – was published. I’ve spent an inordinate amount of time reading it and putting off a review. Why? It’s been a difficult to compare the original and update documents. The update is extremely light on specifics and details for the many goals in the Action Agenda, which should have organizational (like record keeping and efficiency improvements) and public impacts (like figuring out which intersections have the most crashes). I’ll publish my in-depth review this week.

Aside from missing specifics and details, the update presents information differently and is missing status updates for the three to five “performance measures” in each chapter. It was difficult to understand CDOT’s reporter progress without holding the original and update side-by-side. I think listing the original action item, the progress symbol, and then a status update would have been an easier way to read the document.

The update measures some action items differently than originally called for, and the way pothole repair was presented, a problem for people bicycling and driving, caught my analytical eye.

CDOT states a pothole-filling performance measure of the percentage, which it desires to be increased, “patched or fixed within 72 hours of being reported” but the average, according to the website Chicago Potholes, which tracks the city’s open data, is 101 days*. The update doesn’t necessarily explain why, writing “the 72 hour goal for filling potholes is not always feasible due to asphalt plant schedules” and nothing related to the performance measure.

As originally written, the only way to note the performance would be to list the percentage of potholes filled within the goal time, at the beginning and in the update. This performance measure has a complementary action item – an online dashboard – which could have provided the answer, but didn’t.

CDOT published that dashboard this summer as a series of six PDF files that update daily and you can hardly call it useful.

Publishing PDF files in the day and age of open government data – popular with President Obama and Mayor Rahm Emanuel – is unacceptable. Even if they are accessible – meaning you can copy/paste the text – they are poor outlets for data given the nationally-renowned civic innovation changes that Emanuel has succeeded in establishing.

There’s another problem: the dashboard file for pothole tracking doesn’t track the time it takes to close a pothole request, nor the number of pothole requests that are patched within 72 hours. It simply tells the number completed yesterday, the year to date, and the number of unpatched requests. (I’ve posted the pothole-tracking file to Scribd because the dashboard [PDF] doesn’t work in Safari; I also notified city staff to this problem which they acknowledged over three weeks ago.)

The “Chicago Works For You” website reports a different metric, that of the number of requests made each day, distributed by ward.

I discussed the proposed dashboard with former commissioner Gabe Klein over two years ago. He said he wanted to create a dashboard of projects “we’re working on that’s updated once a week.” Given Klein’s high professional accessibility to myself, John Greenfield and other reporters, I’ll give him and CDOT a pass for not doing this. But Klein also said, “I’m really big on transparency and good communication. When I left [Washington,] D.C. our [Freedom of Information Act Requests] were dramatically lowered.”

I’ll consider the pothole performance measure and action item “in need of major progress.”

* For stats geeks, the median is 86 and standard deviation is ±84.