Category: Data

Outta left field: I recreated the city’s contractor listing website

The site looks good and works quickly on mobile devices.

LicensedChicagoContractors.com looks good and works quickly on mobile devices.

I’m working on a secret project to get something installed on the public way. The process to find out how to do it is as arduous as getting it done because you never finish learning the process. Every time you think you’ve figured something out, there’s something else.

To get the secret project installed I need a licensed contractor. Not only do a need a licensed contractor, but they must have the license to do work in the public way (versus doing work at your private property).

The Chicago Department of Buildings publishes a continually updated list of licensed contractors on its website but it’s annoying to use. There’s no search, no permanent links, and if you leave the window open long enough this weird session manager kicks in and stops you from browsing to the next page of results.

I asked my followers on Twitter the best way to scrape the data. The ever-amusing Dan O’Neill, who leads the Smart Chicago Collaborative (which hosts the Chicago Crash Browser), recommended just copying and pasting all 10 pages. That would work fine for the first time, but I might need to do it a second time when the data updates. Nick Bennett jumped in and used Selenium, a tool that automates web browsers. He said, “it’s inefficient but for a small job like that I figured why bother with something faster”.

I imported the data into a MySQL table and ran through some of my “standard” data cleaning methods (like trimming leading and trailing spaces, removing odd characters, and extracting good information into other columns, like phone numbers and ZIP codes).

With PHP – my favorite web language – I created a single page website that loads all 3,930 licensed general contractors extremely fast, loads the DataTables JavaScript library to enhance the table with search and sort. I used Bootstrap to make a responsive design meaning it adjusts to fit multiple screen sizes including smartphones and tablets.

I call it LicensedChicagoContractors.com.

The new website still doesn’t solve my problem of finding a company that can do work in the public way – I’m still working on this. The last online dataset I could find is on the city’s old http://egov.cityofchicago.org domain, and was cached by the Internet Archive’s Wayback Machine on January 25, 2010. Ideally this information – plumbers, public way, and general contractors – should be posted on the City’s data portal.

One day left to enter the Divvy Data Challenge

Divvy dock post-polar vortex

Divvy bikes have been covered in snow frequently this winter. Photo by Jennifer Davis.

As self-proclaimed Divvy Data Brigade Captain* in Chicago’s #opendata and #opengov community I must tell you that all Divvy Data Challenge submissions are due tomorrow, Tuesday, March 11. Divvy posted:

Help us illustrate the answers to questions such as: Where are riders going? When are they going there? How far do they ride? What are top stations? What interesting usage patterns emerge? What can the data reveal about how Chicago gets around on Divvy?

We’re interested in infographics, maps, images, animations, or websites that can help answer questions and reveal patterns in Divvy usage. We’re looking for entries to tell us something new about these trips and show us what they look like.

I’ve seen a handful of the entries so far, including some to which I’ve contributed, and I’m impressed. When the deadline passes I’ll feature my favorites.

Want to play with the data? You should start with these resources, in order:

  1. Divvy Data Challenge – rules and data download
  2. divvy-munging – download an enhanced version of Divvy’s data, with input from several #ChiHackNight hackers
  3. Bike Sharing Data Hackpad – this is where I’m consolidating all of the links to projects, visualizations, analysis, data, and blog posts.
  4. Divvy Data Google Group – a discussion group with over 25 members
  5. #DivvyData – chat on Twitter

It’s not too late to get started now on a project about the bikes themselves. Nick Bennet has crunched the numbers on the bikes’ activity and posted them to the Divvy Data Google Group. Want to use his data and initial analysis? He said “run with it”.

Share your work ahead of time and leave a comment with a link to your project.

* This title is a play on Christopher Whitaker’s position as Code For America Brigade Captain and all around awesome-doer of keeping track of everything that’s going on in these communities and publishing event write-ups on Smart Chicago Collaborative.

Divvy activity in Wicker Park-Bucktown

Divvy Bikes Outside Smoke Daddy

The Divvy bike-share station outside Smoke Daddy on Division Street at Wood Street is the fourth most popular in the Wicker Park & Bucktown neighborhoods. Photo by Daniel Rangel.

This is an analysis of the station use for Divvy bike-share stations in the Wicker Park and Bucktown neighborhoods (they blend together and it’s hard to know if the club or bar you’re going to is one neighborhood or the other).

Numbers represent a discrete trip, from one station to another (or the same station if the trip was greater than 3 minutes, to eliminate “hiccups” where the bike left the dock but didn’t actually go anywhere). Customer means someone who used a 24-hour pass and subscribers are annual members. Gender is self-reported on a member’s DivvyBikes.com user profile.

17 stations listed.

[table id=10 /]

This map of Wicker Park Divvy stations shows a residential service gap among the Damen/Cortland, Ashland/Armitage ( Metra) and North/Wood stations.

This map of Wicker Park Divvy stations shows a residential service gap among the Damen/Cortland, Ashland/Armitage (
Metra) and North/Wood stations.

Based on the popularity of the Ashland/Armitage station, which is right outside the Clybourn Metra station – a very popular train stop – I think there might be a residential service gap near Saint Mary of the Angels School. I recommend a Divvy station at Walsh Park this year because the Bloomingdale Trail will open and terminate there.

Notes

Not all of these stations were online when Divvy launched on June 28, 2013, but I haven’t yet looked into the history to see when each went online. Therefore direct comparisons are not appropriate until you have a trips per day number. Then, seasonality (very cold weather) has its own effect. At the very least, all stations were online by October 29th, with the final addition of the Lincoln Ave & Fullerton Ave (at Halsted) station.

Can someone use “R” to make a time series chart on the entire trips dataset so we can find the best cutoff time to eliminate “hiccups”?

Query used: SELECT count(`trip_id`), usertype, gender FROM `divvy_trips_distances` WHERE (start_station = ‘Claremont Ave & Hirsch St’ or end_station = ‘Claremont Ave & Hirsch St’) AND seconds > 180 GROUP BY `usertype`, gender

Where do Divvy riders go?

Divvys

Divvy bikes fit people of almost all sizes. Photo by Mike Travis (mikeybrick).

Divvy released the 2013 trip data on Tuesday for their data challenge, and presented alongside me the data, basic system operations info, and existing visualizations and apps, at a Divvy data-focused Open Gov Hack Night I put together at the weekly meeting. Thank you Chris Whitaker at Smart Chicago Collaborative for writing the meeting recap.

I “ran the numbers” on some selected slices of the data to post on Twitter and they range from the useless to useful! I’m using the hashtag #DivvyData.

  • Average trip distance of members in 2013 is estimated to be slightly shorter than casuals: 1.81 miles versus 1.56 miles – tweet
  • Bike 321 has traveled the furthest: 989 miles. Beat the next bike by 0.2 miles – tweet
  • Women members on average took longer trips (but fewer trips overall) on @DivvyBikes than men in 2013. – tweet
  • The average trip distance of 759,788 trips (by members and casuals) in 2013 is an estimated 1.68 miles. – tweet
  • In 2013, 79.05% of member trips were by men and 20.95% by women. – tweet
  • On average in 2013, 24-hour pass holders (whom I call casuals) made trips 2.5x longer (time wise) than members. – tweet
  • Damen/Pierce Divvy station (outside the Damen Blue Line station) is most popular in Wicker Park-Bucktown – data

And other stats, presented as embedded tweets:

 

 

How Chicagoans commute map: An interview with the cartographer

Chicago Commute Map by Transitized

A screenshot of the map showing Lakeview and the Brown, Red, Purple and Purple Line Express stations.

Shaun Jacobsen blogs at Transitized.com and yesterday published the How Chicagoans Commute map. I emailed him to get some more insight on why he made it, how, and what insights it tells about Chicago and transit. The map color-symbolizes census tracts based on the simple majority commuting transportation mode.

What got you started on it?

It was your post about the Census data and breaking it down by ZIP code to show people how many homes have cars. I’ve used that method a few times. The method of looking up each case each time it came up took too long, so this kind of puts it in one place.

What story did you want to tell?

I wanted to demonstrate that many households in the city don’t have any cars at all, and these residents need to be planned for as well. What I really liked was how the north side transit lines stuck out. Those clearly have an impact on how people commute, but I wonder what the cause is. Are the Red and Brown Lines really good lines (in people’s opinions) so they take them, or are people deciding to live closer to the lines because they want to use it (because they work downtown, for example)?

The reason I decided to post the map on Thursday was because while I was writing the story about a proposed development in Uptown and I wanted  information on how many people had cars around that development. As the map shows, almost all of Uptown is transit-commuting, and a lot of us don’t even own any cars.

What data and tools did you use?

I first used the Chicago Data Portal to grab the census tract boundaries. Then I grabbed all of the census data for B08141 (“means of transportation to work by number of vehicles available”) and DP04 (“selected housing characteristics”) for each tract and combined it using the tract ID and Excel’s VLOOKUP formula.

Read the rest of this interview on Web Map Academy.