Tag: GIS

Using Google Refine to get the stories out of your data

Let’s say you’re perusing the 309,425 crash reports for automobile crashes in Chicago from 2007 to 2009 and you want to know a few things quickly.

Like how many REAR END crashes there were in January 2007 that had more than 1 injury in the report. With Google Refine, you could do that in about 60 seconds. You just need to know which “facets” to setup.

By the way, there are 90 crash reports meeting those criteria. Look at the screenshot below for how to set that up.

Facets to choose to filter the data

  1. Get your January facet
  2. Add your 2007 facet
  3. Select the collision type of “REAR END” facet
  4. Choose to include all the reports where injury is greater than 1 (click “include” next to each number higher than 1)

After we do this, we can quickly create a map using another Google tool, Fusion Tables.

Make a map

  1. Click Export… and select “Comma-separated value.” The file will download. (Make sure your latitude and longitude columns are called latitude and longitude instead of XCOORD and YCOORD or sometimes Fusion Tables will choke on the location and try to geocode your records, which is redundant.)
  2. Go to Google Fusion Tables and click New Table>Import Table and select your file.
  3. Give the new table a descriptive title, like “January 2007 rear end crashes with more than 1 injury”
  4. In the table view, click Visualize>Map.
  5. BAM!

I completed all the tasks on this page in under 5 minutes and then spent 5 more minutes writing this blog. “The power of Google.”


A map that focuses on striped bikeways in downtown Chicago.

When you look at your bikeways more abstractly, like in the graphic above, do you see deficiencies or gaps in the network? Anything glaring or odd?

It’s a simple exercise: Open up QGIS and load in the relevant geographic data for your city. For Chicago, I added the city boundary, hydrography and parks (for locational reference), and bike lanes and marked-shared lanes*. Symbolize the bikeways to stand out in a bright color. I had the Chicago Transit Authority stations overlaid, but I removed them because it minimized the “black hole of bikeways” I want to show.

What do you see?

Bigger impact map

This exercise can have more impact if it was visualized differently. You have to be familiar with downtown Chicago and the Loop to fully understand why it’s important to notice what’s missing. It’s an extremely office and job dense neighborhood. It also has one of the highest densities of students in the country; the number of people residing downtown continues to grow. If I had good data on how many workers and students there were per building, I could indicate that on the map to show just how many people are potentially affected by the lack of bicycle infrastructure that leads them to their jobs (or class) in the morning, and home in the evening. I don’t know how to account for all of the bicycling that goes through downtown just for events, like at Millennium and Grant Parks, the Cultural Center, and other theaters and venues.

*If you cannot find GIS data for your city, please let me know and I will try to help you find it. It should be available for your city as a matter of course.

Free online GIS tools: An introduction to GeoCommons

Read my tutorial on how I created the pedestrian map with GeoCommons. Read on for an introduction to GeoCommons and online GIS tools.

GeoCommons, like Google My Maps and Earth, is part of the “poor man’s GIS package.” It’s another tool that provides (few) of the functions that desktop GIS software offers. But it excels at making simple and somewhat complex maps.

I first used GeoCommons over a year ago. I started using it because it would convert whatever data you uploaded into another format that was probably more useful. I mentioned it in this article about converting files. For example, if you have a KML file, you can upload it and export it as a shapefile for GIS programs, or a CSV file to load into a table editor or spreadsheet application.

After creating the Chicago bike crash maps using Google Fusion Tables, I wanted to try out another map-making web application, one that provided more customization and prettier maps.

I found that web application and created a version of the bike crash maps, with several other data layers, in GeoCommons. I overlaid bike counts and bikeways so you can observe some relationships between each visual dataset. My latest map (screenshot below), created Wednesday, shows pedestrian counts in downtown Chicago overlaid with CTA and downtown Metra stations, as well as the 48 intersections with the most pedestrian collisions (from this UNC study, PDF).

Screenshot of pedestrian count map described above.

How these online GIS tools can be useful to you

I bet there’s a way you can use Google Fusion Tables and GeoCommons for your job or project. They’re extremely simple to use: they can take in data from the spreadsheets you’re already working on and turn them into themed reference maps. With mapping, you can do simple, visual analysis that doesn’t require statistical software or knowledge.

Imagine plotting your client list on a map and grouping them by age to see if perhaps your younger clients tend to live in the same neighborhoods of town, or if they’re more diverse (should you do this, keep the map private, something that you can’t do in GeoCommons – yet).

You may also find it useful if you want to create a route for your salespeople or for visiting church members at their homes. Plot all the addresses on a map, then manually filter them into different groups based on the clusters you see. With Google Fusion Tables, you can easily add a new column with the GROUP information and apply a numbered or lettered group and then re-sort.

Other things you can do in GeoCommons

  • Merge tables with geography – I uploaded two datasets: a table containing census tract IDs and demographic information for Cook County I downloaded from the American FactFinder 2; and a shapefile containing Cook County census tracts boundary information. After merging them, I could download a NEW shapefile that contained both datasets.
  • Make multi-layer maps
  • Symbolize based on frequency/rate
  • Convert data – This is by far the most useful feature. It imports “shapefiles (SHP), comma separated values (CSV), Keyhole Markup Language (KML), and GeoRSS” and exports “Shapefile, CSV, KML, GeoRSS Atom, Spatialite, and JSON” (from the GeoCommons user manual).

Read my tutorial on how I created the pedestrian map with GeoCommons.

How to create a map in GeoCommons

GeoCommons (GC) is like Google My Maps but more powerful. Read my introduction to GC.

Tips before starting

  • With GC, I’m still figuring out what I must decide before I choose to add or amend something and what I can edit after I’ve made a change.
  • You cannot edit the data table directly.
  • You CAN replace data – click “reupload” – but the columns must match between original and replacement data.
  • Click Save often when making the map. You never know when Adobe Flash is going to quit on you.

One of the busiest locations in Chicago, for people walking, or riding buses and trains. Also a lot of taxi traffic and medium bike traffic. At Adams Street and Riverside Plaza (er, the Chicago River).


  1. Prepare your data.”We support Spreadsheets (as CSVs), Shapefiles, KML, RSS, ATOM and GeoRSS. We also support WMS and Tile services!” GeoCommons has instructions on how to prepare your spreadsheets for geocoding (if not already geocoded; GC will also work with predefined XY coordinates or street addresses). Ensure fields holding numbers have their type set as numeric in the GIS or spreadsheet program or you may run into roadblocks later on when trying to analyze these fields.
  2. If uploading a shapefile, GC requires the SHX and DBF files as well. The PRJ file will also help GC know how to reproject your data on the fly. GC base layer maps are projected in WGS84, just like Google Maps. Without the PRJ file, your data may not show. [Can the user set projection?]
  3. Upload data.
  4. You need to turn your newly uploaded data from a “pending dataset” to a completed dataset. In this process you will tell GC a little more about your data, including which columns hold the XY coordinates (even though it guesses this). you can also change the attribute names and describe the content of those attributes (you can also change this later).
  5. So click “Next Step” to start this process.
  6. In the “Review Your Geodata” step, you may see that GC has found some additional columns in your dataset. I’m not sure why this is. Delete these columns by selecting the header and clicking Delete Column. Then click Save Changes. You can select multiple columns at a time by holding the Command (Mac) or Control (Windows) keys.
  7. Add metadata; edit attribute names and add descriptions.
  8. You’re done. GC will present you a page with statistics and options to download your data in different formats.
  9. If you want to make a map with more data, follow the process again starting at Step 1. If not, continue.
  10. Make a map! Click “Map Data” or the “Make a Map” button in toolbar.
  11. A map of the world will load. When GC has finished loading your “new layer,” the map will zoom in.
  12. For the pedestrian map, I want to symbolize the data with a single color but changing the size of the circle based how many people were counted there (your data must have this attribute in numeric form – if it doesn’t you may have to reupload your data). Click “Add Data” and then in the Map Brewer box that appears by:
    1. Click on Visual Theme. Click next.
    2. Select the NUMERIC attribute. In the pedestrian data, this is “count.”
    3. Then select whether or not you want colors or sizes. You can not change this later. You would just delete the layer and add the layer again (using your already uploaded dataset).
    4. Select what type of classification you want. This is entirely up to you and how you want the map to look and based on what data you have. You can change this later.
    5. Choose your shape and color.
  13. Add more data by clicking Add Data button. I think my map would be more useful and interesting if it also showed where the train stations are, a major destination category for people who walk downtown on weekdays. I will symbolize by a solid color. Instead of visual theme, which I chose for the ped counts, I will just choose Points, Lines & Areas. At this time, GC doesn’t allow custom icons.
  14. Re-order layers by dragging them up and down in the layers box. Click on the boxy “handle” to the left of the layer.
  15. Change the layer names by single clicking on the layer name. Press Enter when you’re done.
  16. Change the map name by singe click on it. Press Enter when you’re done.

After creating my pedestrian map, I had some suggestions for GeoCommons, the people who collected the pedestrian count data, and my own map.

  • GeoCommons should add a map preview image for better sharing on Facebook and other websites that look for this.
  • GeoCommons should allow maps to be private after creation – I think after you click save, they are added to a gallery (I could be wrong).
  • The data collectors should add more locations, particularly around Union Station and the two Clinton CTA stations (also between CTA and Metra stations).
  • The data collectors should add “date collected” to the data table
  • The data collectors should extend survey hours to better match commuting patterns. A majority of the collections end at 5:45 PM while Metra’s rush hour ends just before 7 PM (this is when train departure frequency drops).
  • I should add ridership data to the train stations so we can see which CTA and Metra stations are most used.

You asked for it, you got it – Chicago bike count data

Note: This post doesn’t have any analysis of the data or report, nor do I make any observations. I think it’s more significant to hear the ideas you have about what you see in the map or read in the data.

A lot of people wanted the Chicago bike crash and injury data overlaid with bike counts data.

In 2009, Chicago Department of Transportation (CDOT) placed automatic bike counting equipment at many locations around the city. It uses pneumatic tubes to count the number of bicyclists (excludes cars) at that point in the street – it counts ALL trips, and cannot distinguish between people going to work or going to school. This is dissimilar from Census data which asks respondents to indicate how they go to work.

Well, good news for you! CDOT today released the bike counts report from data collected in 2009 (just in time). There has been overwhelming response about the bike crash map I published – this shows how rabid the public is for information on their environments (just yesterday someone told me that they switched bike routes based on the crash frequency they noticed on their original route).

The size of the blue dot indicates the bicycle mode share for that count location. Mode share calculated by adding bikes and cars and dividing by bikes.

Get the data

A photo of the EcoCounter counting machine in action on Milwaukee Avenue (this was taken during testing phase, where CDOT compared automatic and manual counts to determine the machine’s accuracy).

How to use this map:

  1. Find a blue dot (count location) in an area you’re interested in.
  2. Zoom into that blue dot.
  3. Click on the blue dot to get the number of bikes counted there.
  4. Then observe the number of purple dots (crashes) near that count location.

What do you see that’s interesting?

What else is coming?

Now let’s hope the Active Transportation Alliance and the Chicago Park District release their Lakefront Trail counts from summer 2010. CDOT may have conducted bicycle counts in 2010 as well – I hope we don’t have to wait as long for that data.

I hope to have a tutorial on how to use GeoCommons coming soon. You should bug me about it if I don’t post it within one week.

Photos of Chicago bike commuters by Joshua Koonce.

Bike crash map in the press

Thank you to the Bay Citizen, Gapers Block, and the Chicago Bicycle Advocate (lawyer Brendan Kevenides). They’ve all written about the bike crash map I produced using Google Fusion Tables. And WGN 720 AM interviewed me and aired it in April 2011.

View the map now. The map needs to be updated with injury severity, a field I mistakenly removed before uploading the data.

The Bay Citizen started this by creating their own map of bike crashes for San Francisco, albeit with more information. I had helped some UIC students obtain the data from the Illinois Department of Transportation for their GIS project and have a copy of it myself. I quickly edited it using uDig and threw it up online in an instant map created by Fusion Tables.

A guy rides his bicycle on the “hipster highway” (aka Milwaukee Avenue), the street with the most crashes, but also has the most people biking (in mode share and pure quantity).

Why did I make the map?

I made this project for two reasons: One is to continue practicing my GIS skills and to learn new software and new web applications. The second reason was to put the data out there. There’s a growing trend for governments to open up their databases, and your readers have probably seen DataSF.org’s App Showcase. But in Chicago, we’re not seeing this trend. Instead of data, we get a list of FOIA requests, or instead of searchable City Council meeting minutes, we get PDFs that link to other PDFs that you must first select from drop down boxes. But both of these are improvements from before.

I would love to help anyone else passionate about bicycling in Chicago to find ways to use this data or project to address problems. I think bicycling in Chicago is good for many people, but we can make it better and for more people.

Read the full interview.

Trying out uDig, a free, multi-platform GIS application

ArcGIS is the standard in geographic information system applications. I don’t like that it’s expensive, unwieldy to install and update, and its user interface is stymying and slow*. I also use Mac OS X most of the time and ArcGIS is not available for Mac. It doesn’t have to be the standard.

I’ve tried my hand at Cartographica and QGIS. I really like QGIS because there’re many plugins, it’s open source, there’s a diverse community supporting it, and best of all, it’s free. I’ve written about Cartographica once – I’m not a fan right now.

My project

  • The data: Bicycle crashes in the City of Chicago as reported to IDOT for 2007-2009
  • Goal: Publish an interactive map of this data using Google Fusion Tables and its instant mapping feature.
  • Visualizing it: Added streets (prepared beforehand to exclude highways), water features, and city boundary (get that here)
  • Process: Combine bike crash data; reproject to WGS84 for Google; remove extraneous information; add latitude/longitude coordinates; export as CSV; upload to Google Fusion Tables; map it!
  • View the final product

Trying out uDig

In reaching my goal I had a task that I couldn’t figure out how to complete with QGIS: I needed to combine three shapefiles with identical table schemes into one shapefile – this one shapefile would eventually be published as one map. The join feature in fTools wasn’t working so I looked for a new solution, uDig, or “User-friendly Desktop Internet GIS.”

The solution was very easy. Highlight all the records in the attribute table of one shapefile, click Edit>Copy, then select the destination table and click Edit>Paste. The new records were added within a couple seconds. I could then bring this data back into QGIS to finish the process (outlined above under Project). I did use fTools later in the process to add lat/long coordinates to my single shapefile.

After adding more data to better visualize the crashes in Chicago, I noticed that uDig renders maps to look smoother and slightly prettier than QGIS or ArcGIS. See the screenshot below.

A screenshot of the three bicycle crash datasets (2007, 2008, 2009) with the visualization data added.

The end product: three years of police reported bicycle crashes in the City of Chicago on an interactive map powered by Google Fusion Tables, another product in Google’s arsenal of GIS for the poor man. View the final product.

*I haven’t used ArcGIS version 10 yet, which I see and read has an improved user interface; it’s unclear to me and other users if the program’s been updated to take advantage of multi-core processors. ESRI has a roundabout way of describing their support.

I want to make a crash reporting tool

UPDATE 12-01-10: Thank you to Richard Masoner for posting this on Cyclelicious. I have started collecting everyone’s great ideas and responses in this development document.

Hot off the heels of making my “Can I bring my bike on Metra right now?” web application, I am ready to start on the next great tool*.

I want to create a bicycle crash reporting tool for Chicago (but release the source code for any city’s residents to adopt) along the lines of B-SMaRT for Portlanders and the Boston Cyclist’s Union crash map based on 911 calls.

I’d rather not reinvent the wheel (but I’m very capable of building a new web application based in PHP and MySQL) so I’ve been trying to get in contact with Joe Broach, the creator of B-SMaRT, to get my hands on that source code.

Not exactly the type of crash I’ll be looking for. Photo by Jason Reed.

I want the Chicago Crash Collector (please think of a better name) to have both citizen-reported data, and data from police reports. I just sent in my FOIA request for police data to the Chicago Police Department, but I’m not holding my breath for that.

3D experimentation to improve pedestrian environment

3D experimentation to improve pedestrian environment from Steven Vance on Vimeo.

The movies aren’t the only place where you’ll see in 3D! Go check out Clark and Deming (about 2540 N Clark Street) in the Lincoln Park neighborhood to see a special pedestrian safety marking on the pavement. They were installed on October 18, 2010.

Designed to increase nighttime visibility

Who’s involved?

  • Chicago Department of Transportation (CDOT)
  • National Highway Traffic Safety Administration (NHTSA)
  • Western Michigan University (WMU)

All three entities are involved in the installation of these optical illusion zig-zag markings.

Between and including years 2005 and 2009, there were at least 2 reports of injuries to people walking and at least 9 reports of collisions involving people riding bikes*. With that data in mind, I’m not sure why this location was selected for a pavement marking whose aim is to improve pedestrian safety. The data do indicate that this intersection has a lot of bicycle-related collisions, much more than I’m seeing for other intersections.

Clark and Deming

Curiously, there are no automobile traffic counts for miles in either direction on Clark Street so one cannot compare the number of collisions at the Clark and Deming intersection with other intersections in town. Out of over 1,200 count locations, Clark Street in Lincoln Park was skipped.

Not really stopping for pedestrians

*Data from the Illinois Department of Transportation Safety Data Mart (which was taken down in 2015 or 2016)

Obtaining Chicago Transit Authority geodata

A reader asked where they could get Chicago Transit Authority (CTA) data I didn’t already have on the “Find GIS data” page. I only had shapefiles for train lines and stations. Now I’ve got bus routes and stops.

You can download General Transit Feed Specification (GTFS) data from the CTA’s Developer Center. It’s updated regularly when service changes.

Screenshot from ESRI ArcMap showing the unedited shapes.txt file loaded via Tools>Add XY Data. Shapes.txt is an 18 MB comma-delimited text file with thousands of points that can be grouped together with their shape_id.

The GTFS has major benefits over providing shapefiles to the public.

  1. It can be easily converted to the common shapefile format, or KML format.
  2. Google, the inventor of GTFS, has defined and documented it well; it is unencoded and plaintext. These attributes make it easy for programmers and hackers to manipulate it in many ways. (see also item 4)
  3. Google provides a service to the public on its website, an easy to use and robust transit planning service.
  4. The data is stored as plaintext CSV files.
  5. While an agency like CTA may have a geodata server on its intranet, it is less likely it has the addons that provide mapping and geodata services for the internet. A server like Web Mapping Service, or ArcIMS. These systems can be expensive to purchase and license. And we all know how the CTA seems to always be in a money crunch. While the CTA updates its GTFS data for publishing to Google Maps, the public can download it simultaneously to always have up-to-date information, providing the same geodata that ArcIMS or WMS would offer but for no additional cost.

I couldn’t have pulled off this conversion in 24 hours without the help of Steven Romalewski’s blog, Spatiality. He pointed me to the right ArcMap plugin in this post about converting the Metropolitan Transportation Authority’s GTFS data into shapefiles. I hope Steven doesn’t move to Chicago less my authority on GIS and transit be placed in check!

Make your own map of the CTA train routes and perform some kind of analysis – then share it with the rest of us!

Read more about my exercise in geodata conversion in the full post.
Continue reading