Tag: GIS

Using Google Refine to get the stories out of your data

Let’s say you’re perusing the 309,425 crash reports for automobile crashes in Chicago from 2007 to 2009 and you want to know a few things quickly.

Like how many REAR END crashes there were in January 2007 that had more than 1 injury in the report. With Google Refine, you could do that in about 60 seconds. You just need to know which “facets” to setup.

By the way, there are 90 crash reports meeting those criteria. Look at the screenshot below for how to set that up.

Facets to choose to filter the data

  1. Get your January facet
  2. Add your 2007 facet
  3. Select the collision type of “REAR END” facet
  4. Choose to include all the reports where injury is greater than 1 (click “include” next to each number higher than 1)

After we do this, we can quickly create a map using another Google tool, Fusion Tables.

Make a map

  1. Click Export… and select “Comma-separated value.” The file will download. (Make sure your latitude and longitude columns are called latitude and longitude instead of XCOORD and YCOORD or sometimes Fusion Tables will choke on the location and try to geocode your records, which is redundant.)
  2. Go to Google Fusion Tables and click New Table>Import Table and select your file.
  3. Give the new table a descriptive title, like “January 2007 rear end crashes with more than 1 injury”
  4. In the table view, click Visualize>Map.
  5. BAM!

I completed all the tasks on this page in under 5 minutes and then spent 5 more minutes writing this blog. “The power of Google.”

Gaps

A map that focuses on striped bikeways in downtown Chicago.

When you look at your bikeways more abstractly, like in the graphic above, do you see deficiencies or gaps in the network? Anything glaring or odd?

It’s a simple exercise: Open up QGIS and load in the relevant geographic data for your city. For Chicago, I added the city boundary, hydrography and parks (for locational reference), and bike lanes and marked-shared lanes*. Symbolize the bikeways to stand out in a bright color. I had the Chicago Transit Authority stations overlaid, but I removed them because it minimized the “black hole of bikeways” I want to show.

What do you see?

Bigger impact map

This exercise can have more impact if it was visualized differently. You have to be familiar with downtown Chicago and the Loop to fully understand why it’s important to notice what’s missing. It’s an extremely office and job dense neighborhood. It also has one of the highest densities of students in the country; the number of people residing downtown continues to grow. If I had good data on how many workers and students there were per building, I could indicate that on the map to show just how many people are potentially affected by the lack of bicycle infrastructure that leads them to their jobs (or class) in the morning, and home in the evening. I don’t know how to account for all of the bicycling that goes through downtown just for events, like at Millennium and Grant Parks, the Cultural Center, and other theaters and venues.

*If you cannot find GIS data for your city, please let me know and I will try to help you find it. It should be available for your city as a matter of course.

Free online GIS tools: An introduction to GeoCommons

Read my tutorial on how I created the pedestrian map with GeoCommons. Read on for an introduction to GeoCommons and online GIS tools.

GeoCommons, like Google My Maps and Earth, is part of the “poor man’s GIS package.” It’s another tool that provides (few) of the functions that desktop GIS software offers. But it excels at making simple and somewhat complex maps.

I first used GeoCommons over a year ago. I started using it because it would convert whatever data you uploaded into another format that was probably more useful. I mentioned it in this article about converting files. For example, if you have a KML file, you can upload it and export it as a shapefile for GIS programs, or a CSV file to load into a table editor or spreadsheet application.

After creating the Chicago bike crash maps using Google Fusion Tables, I wanted to try out another map-making web application, one that provided more customization and prettier maps.

I found that web application and created a version of the bike crash maps, with several other data layers, in GeoCommons. I overlaid bike counts and bikeways so you can observe some relationships between each visual dataset. My latest map (screenshot below), created Wednesday, shows pedestrian counts in downtown Chicago overlaid with CTA and downtown Metra stations, as well as the 48 intersections with the most pedestrian collisions (from this UNC study, PDF).

Screenshot of pedestrian count map described above.

How these online GIS tools can be useful to you

I bet there’s a way you can use Google Fusion Tables and GeoCommons for your job or project. They’re extremely simple to use: they can take in data from the spreadsheets you’re already working on and turn them into themed reference maps. With mapping, you can do simple, visual analysis that doesn’t require statistical software or knowledge.

Imagine plotting your client list on a map and grouping them by age to see if perhaps your younger clients tend to live in the same neighborhoods of town, or if they’re more diverse (should you do this, keep the map private, something that you can’t do in GeoCommons – yet).

You may also find it useful if you want to create a route for your salespeople or for visiting church members at their homes. Plot all the addresses on a map, then manually filter them into different groups based on the clusters you see. With Google Fusion Tables, you can easily add a new column with the GROUP information and apply a numbered or lettered group and then re-sort.

Other things you can do in GeoCommons

  • Merge tables with geography – I uploaded two datasets: a table containing census tract IDs and demographic information for Cook County I downloaded from the American FactFinder 2; and a shapefile containing Cook County census tracts boundary information. After merging them, I could download a NEW shapefile that contained both datasets.
  • Make multi-layer maps
  • Symbolize based on frequency/rate
  • Convert data – This is by far the most useful feature. It imports “shapefiles (SHP), comma separated values (CSV), Keyhole Markup Language (KML), and GeoRSS” and exports “Shapefile, CSV, KML, GeoRSS Atom, Spatialite, and JSON” (from the GeoCommons user manual).

Read my tutorial on how I created the pedestrian map with GeoCommons.

How to create a map in GeoCommons

GeoCommons (GC) is like Google My Maps but more powerful. Read my introduction to GC.

Tips before starting

  • With GC, I’m still figuring out what I must decide before I choose to add or amend something and what I can edit after I’ve made a change.
  • You cannot edit the data table directly.
  • You CAN replace data – click “reupload” – but the columns must match between original and replacement data.
  • Click Save often when making the map. You never know when Adobe Flash is going to quit on you.

One of the busiest locations in Chicago, for people walking, or riding buses and trains. Also a lot of taxi traffic and medium bike traffic. At Adams Street and Riverside Plaza (er, the Chicago River).

Tutorial

  1. Prepare your data.”We support Spreadsheets (as CSVs), Shapefiles, KML, RSS, ATOM and GeoRSS. We also support WMS and Tile services!” GeoCommons has instructions on how to prepare your spreadsheets for geocoding (if not already geocoded; GC will also work with predefined XY coordinates or street addresses). Ensure fields holding numbers have their type set as numeric in the GIS or spreadsheet program or you may run into roadblocks later on when trying to analyze these fields.
  2. If uploading a shapefile, GC requires the SHX and DBF files as well. The PRJ file will also help GC know how to reproject your data on the fly. GC base layer maps are projected in WGS84, just like Google Maps. Without the PRJ file, your data may not show. [Can the user set projection?]
  3. Upload data.
  4. You need to turn your newly uploaded data from a “pending dataset” to a completed dataset. In this process you will tell GC a little more about your data, including which columns hold the XY coordinates (even though it guesses this). you can also change the attribute names and describe the content of those attributes (you can also change this later).
  5. So click “Next Step” to start this process.
  6. In the “Review Your Geodata” step, you may see that GC has found some additional columns in your dataset. I’m not sure why this is. Delete these columns by selecting the header and clicking Delete Column. Then click Save Changes. You can select multiple columns at a time by holding the Command (Mac) or Control (Windows) keys.
  7. Add metadata; edit attribute names and add descriptions.
  8. You’re done. GC will present you a page with statistics and options to download your data in different formats.
  9. If you want to make a map with more data, follow the process again starting at Step 1. If not, continue.
  10. Make a map! Click “Map Data” or the “Make a Map” button in toolbar.
  11. A map of the world will load. When GC has finished loading your “new layer,” the map will zoom in.
  12. For the pedestrian map, I want to symbolize the data with a single color but changing the size of the circle based how many people were counted there (your data must have this attribute in numeric form – if it doesn’t you may have to reupload your data). Click “Add Data” and then in the Map Brewer box that appears by:
    1. Click on Visual Theme. Click next.
    2. Select the NUMERIC attribute. In the pedestrian data, this is “count.”
    3. Then select whether or not you want colors or sizes. You can not change this later. You would just delete the layer and add the layer again (using your already uploaded dataset).
    4. Select what type of classification you want. This is entirely up to you and how you want the map to look and based on what data you have. You can change this later.
    5. Choose your shape and color.
  13. Add more data by clicking Add Data button. I think my map would be more useful and interesting if it also showed where the train stations are, a major destination category for people who walk downtown on weekdays. I will symbolize by a solid color. Instead of visual theme, which I chose for the ped counts, I will just choose Points, Lines & Areas. At this time, GC doesn’t allow custom icons.
  14. Re-order layers by dragging them up and down in the layers box. Click on the boxy “handle” to the left of the layer.
  15. Change the layer names by single clicking on the layer name. Press Enter when you’re done.
  16. Change the map name by singe click on it. Press Enter when you’re done.

After creating my pedestrian map, I had some suggestions for GeoCommons, the people who collected the pedestrian count data, and my own map.

  • GeoCommons should add a map preview image for better sharing on Facebook and other websites that look for this.
  • GeoCommons should allow maps to be private after creation – I think after you click save, they are added to a gallery (I could be wrong).
  • The data collectors should add more locations, particularly around Union Station and the two Clinton CTA stations (also between CTA and Metra stations).
  • The data collectors should add “date collected” to the data table
  • The data collectors should extend survey hours to better match commuting patterns. A majority of the collections end at 5:45 PM while Metra’s rush hour ends just before 7 PM (this is when train departure frequency drops).
  • I should add ridership data to the train stations so we can see which CTA and Metra stations are most used.

You asked for it, you got it – Chicago bike count data

Note: This post doesn’t have any analysis of the data or report, nor do I make any observations. I think it’s more significant to hear the ideas you have about what you see in the map or read in the data.

A lot of people wanted the Chicago bike crash and injury data overlaid with bike counts data.

In 2009, Chicago Department of Transportation (CDOT) placed automatic bike counting equipment at many locations around the city. It uses pneumatic tubes to count the number of bicyclists (excludes cars) at that point in the street – it counts ALL trips, and cannot distinguish between people going to work or going to school. This is dissimilar from Census data which asks respondents to indicate how they go to work.

Well, good news for you! CDOT today released the bike counts report from data collected in 2009 (just in time). There has been overwhelming response about the bike crash map I published – this shows how rabid the public is for information on their environments (just yesterday someone told me that they switched bike routes based on the crash frequency they noticed on their original route).

The size of the blue dot indicates the bicycle mode share for that count location. Mode share calculated by adding bikes and cars and dividing by bikes.

Get the data

A photo of the EcoCounter counting machine in action on Milwaukee Avenue (this was taken during testing phase, where CDOT compared automatic and manual counts to determine the machine’s accuracy).

How to use this map:

  1. Find a blue dot (count location) in an area you’re interested in.
  2. Zoom into that blue dot.
  3. Click on the blue dot to get the number of bikes counted there.
  4. Then observe the number of purple dots (crashes) near that count location.

What do you see that’s interesting?

What else is coming?

Now let’s hope the Active Transportation Alliance and the Chicago Park District release their Lakefront Trail counts from summer 2010. CDOT may have conducted bicycle counts in 2010 as well – I hope we don’t have to wait as long for that data.

I hope to have a tutorial on how to use GeoCommons coming soon. You should bug me about it if I don’t post it within one week.

Photos of Chicago bike commuters by Joshua Koonce.