Category: Open Access

Using Google Refine to get the stories out of your data

Let’s say you’re perusing the 309,425 crash reports for automobile crashes in Chicago from 2007 to 2009 and you want to know a few things quickly.

Like how many REAR END crashes there were in January 2007 that had more than 1 injury in the report. With Google Refine, you could do that in about 60 seconds. You just need to know which “facets” to setup.

By the way, there are 90 crash reports meeting those criteria. Look at the screenshot below for how to set that up.

Facets to choose to filter the data

  1. Get your January facet
  2. Add your 2007 facet
  3. Select the collision type of “REAR END” facet
  4. Choose to include all the reports where injury is greater than 1 (click “include” next to each number higher than 1)

After we do this, we can quickly create a map using another Google tool, Fusion Tables.

Make a map

  1. Click Export… and select “Comma-separated value.” The file will download. (Make sure your latitude and longitude columns are called latitude and longitude instead of XCOORD and YCOORD or sometimes Fusion Tables will choke on the location and try to geocode your records, which is redundant.)
  2. Go to Google Fusion Tables and click New Table>Import Table and select your file.
  3. Give the new table a descriptive title, like “January 2007 rear end crashes with more than 1 injury”
  4. In the table view, click Visualize>Map.
  5. BAM!

I completed all the tasks on this page in under 5 minutes and then spent 5 more minutes writing this blog. “The power of Google.”

Be specific. Be, be specific.

Update September 5, 2011: I gave a short speech to Moving Design participants about language and word choice, a kind of follow up to this article, as a “policy insight of the day.”

When speaking or presenting, be as specific as possible. The following are examples specific to the course of transportation discussions.

“Car traffic banned from this road.” Are you also banning trucks and SUVs?

“Vehicles will be rerouted.” Does this include those riding bicycles? Here’s an example of a current detour that only mentions cars, buses, and trucks. Which route should someone riding a bicycle take? Sometimes state and local laws will classify a bicycle as a vehicle, but then exclude it in specific passages – it’s weird. Better just call out specific vehicles, be they of the motorized or human-powered variety.

“Cars are aggressive to bikes.” Cars and bikes don’t operate themselves.

“We plan to narrow the road to calm traffic.” Are you going to narrow the road, or narrow certain lanes and reassign portions of the road to different uses, like a protected bike lane, or wider sidewalk? Then give the measurement of lanes, the sidewalk, and the curb face-to-curb face width. Consider that “street” is not a synonym for “road.” Road often represents what’s between the curbs, and the pavement, while street includes the road as well as the sidewalk. Street is a bit more abstract as well, sometimes meaning the activity that occurs on or around roads (like “street life”).

“Ignorant drivers…” Or do they lack specific education and relevant information?

This bikeway in Bremen, Germany, uses both color and pavement design to delineate space for people bicycling (like me) and people walking.

Better bike crash map now available for Chicago

I met Derek at a get together for “urban geeks” last Tuesday where he told me he was making a filterable/searchable version of my Chicago bike crash map using the Google Fusion Tables API. It essentially allows you to perform SQL-like queries to show different results on the map than one view. It’s possible to do this yourself if you open the bike crash map in the full Google Fusion Tables interface (do that now).

You can use it now!

New bike crash map, click through to view

Derek’s map has the benefit of great interface to drill down to the data you want. You can select a day, a surface condition, and the injury type. To download the data yourself, you’ll still have to access the full Fusion Tables interface.

Door lane photo and graphic by Gary Kavanagh in Santa Monica, California.

And since the data is the same as my original map, crash reports involving motor vehicle doors are not included. Here’s why doorings are excluded.

Reminder about open data and Obama’s Open Government Directive

Quickly after taking office, President Obama issued a memorandum about open government and opening government data. Then came the Open Government Directive* which said:

To the extent practicable and subject to valid restrictions, agencies should publish information online in an open format that can be retrieved, downloaded, indexed, and searched by commonly used web search applications. An open format is one that is platform independent, machine readable, and made available to the public without restrictions that would impede the re-use of that information.

Essentially, the executive government (er, Obama Administration) adopts the presumption of openness, that distributing public data is the default position and action to take.

Don’t squat on the data. Don’t fret over how people will view or manipulate the data – this is not your concern. Don’t delay its release. If you do this, you are a frigid dataist and I will remember this.

Photo of visual note taking at an open data seminar by Karen Quinn.

*The Directive has a little more backbone than the original memorandum: “This memorandum requires executive departments and agencies to take the following steps toward the goal of creating a more open government.”

Thank you to Tech President.

Trying out uDig, a free, multi-platform GIS application

ArcGIS is the standard in geographic information system applications. I don’t like that it’s expensive, unwieldy to install and update, and its user interface is stymying and slow*. I also use Mac OS X most of the time and ArcGIS is not available for Mac. It doesn’t have to be the standard.

I’ve tried my hand at Cartographica and QGIS. I really like QGIS because there’re many plugins, it’s open source, there’s a diverse community supporting it, and best of all, it’s free. I’ve written about Cartographica once – I’m not a fan right now.

My project

  • The data: Bicycle crashes in the City of Chicago as reported to IDOT for 2007-2009
  • Goal: Publish an interactive map of this data using Google Fusion Tables and its instant mapping feature.
  • Visualizing it: Added streets (prepared beforehand to exclude highways), water features, and city boundary (get that here)
  • Process: Combine bike crash data; reproject to WGS84 for Google; remove extraneous information; add latitude/longitude coordinates; export as CSV; upload to Google Fusion Tables; map it!
  • View the final product

Trying out uDig

In reaching my goal I had a task that I couldn’t figure out how to complete with QGIS: I needed to combine three shapefiles with identical table schemes into one shapefile – this one shapefile would eventually be published as one map. The join feature in fTools wasn’t working so I looked for a new solution, uDig, or “User-friendly Desktop Internet GIS.”

The solution was very easy. Highlight all the records in the attribute table of one shapefile, click Edit>Copy, then select the destination table and click Edit>Paste. The new records were added within a couple seconds. I could then bring this data back into QGIS to finish the process (outlined above under Project). I did use fTools later in the process to add lat/long coordinates to my single shapefile.

After adding more data to better visualize the crashes in Chicago, I noticed that uDig renders maps to look smoother and slightly prettier than QGIS or ArcGIS. See the screenshot below.

A screenshot of the three bicycle crash datasets (2007, 2008, 2009) with the visualization data added.

The end product: three years of police reported bicycle crashes in the City of Chicago on an interactive map powered by Google Fusion Tables, another product in Google’s arsenal of GIS for the poor man. View the final product.

*I haven’t used ArcGIS version 10 yet, which I see and read has an improved user interface; it’s unclear to me and other users if the program’s been updated to take advantage of multi-core processors. ESRI has a roundabout way of describing their support.