Tag: Google Maps

Using Google Refine to get the stories out of your data

Let’s say you’re perusing the 309,425 crash reports for automobile crashes in Chicago from 2007 to 2009 and you want to know a few things quickly.

Like how many REAR END crashes there were in January 2007 that had more than 1 injury in the report. With Google Refine, you could do that in about 60 seconds. You just need to know which “facets” to setup.

By the way, there are 90 crash reports meeting those criteria. Look at the screenshot below for how to set that up.

Facets to choose to filter the data

  1. Get your January facet
  2. Add your 2007 facet
  3. Select the collision type of “REAR END” facet
  4. Choose to include all the reports where injury is greater than 1 (click “include” next to each number higher than 1)

After we do this, we can quickly create a map using another Google tool, Fusion Tables.

Make a map

  1. Click Export… and select “Comma-separated value.” The file will download. (Make sure your latitude and longitude columns are called latitude and longitude instead of XCOORD and YCOORD or sometimes Fusion Tables will choke on the location and try to geocode your records, which is redundant.)
  2. Go to Google Fusion Tables and click New Table>Import Table and select your file.
  3. Give the new table a descriptive title, like “January 2007 rear end crashes with more than 1 injury”
  4. In the table view, click Visualize>Map.
  5. BAM!

I completed all the tasks on this page in under 5 minutes and then spent 5 more minutes writing this blog. “The power of Google.”

TransportationCamp: Real-Time Pedestrian and Bike Location, Session Two

Real-Time Pedestrian and Bike Location How can we get it? What can we do with it? How can it not be creepy?
By Eric Fischer.

My summary of the discussion

There are many existing data sources that are published or have APIs that could stand as reasonable proxies for tracking people who are walking, biking, or just ambling around the city – some of this information is given away (via Foursquare) by those who are traveling, and other information is collected in real time (buses and taxis) and after the trip (travel surveys and Flickr photos). I don’t think the group agreed on any good use for this data (knowing where people are in the city right now), nor did the group come up with ways to ensure this collection is not “creepy.”

Eric’s original question involved the location of people bicycling, but the discussion spent more time talking about pedestrians. However, some techniques in tracking and data gathering could be applied to both modes.

See final paragraph for links on “further reading” that I find relevant to this discussion.

Schedule board at TransportationCamp West on Saturday in San Francisco at Public Works SF, 161 Erie Street.

[Ideas and statements are credited where I could keep track of who said what, and if I could see your name badge.]

Eric, starting us off:
We have a lot of information about where motor vehicles (MV) are in cities.
A lot of experience of city is not about being in a MV, though.

How many bikers going through intersection that are NOT getting hurt.
Finding places where people walk and where people’ don’t.

Where do people go on foot and on bikes?
As far as I know this isn’t available

Foursquare has benefits (awards) so people are willing to give the data, but we don’t want another Please Rob Me.

In SF, there are flash mobs, sudden protests, Critical Mass

Data sources:
-buses – boarding and deboarding – you can get a flow map from this. Someone said that Seattle has this data open.
-CTPP (Census Transportation Planning Package)
-city ped count
-Eric: Where people get on/off taxis.

“CycleTracks” – sampling bias, people with iPhones
-70% of handheld devices are feature phones, not smart phones. So there’s another sampling bias.

Opt-in factor
How do you sample?

SF Planning Dept. had a little program or project ask people to plot on a map your three most common walking routes.
What is your favorite street, and where do you not like to walk?

Eric: My collection tool is Flickr. Geotags and timestamps.

Magdalena Palugh: Are there incentives for commuting by bike? There are incentives for people who vanpool.
If there is incentive, I would gladly give up my data.
Michael Schwartz (SFCTA, sp?) What is difference <> SFCTA/MTA?

-If part of this is to get at where the trouble spots are, could you have people contribute where the good/bad parts are? “This overpass really sucks.”

Tom: Can you get peds from aerial images?
-Yes, but there’re too many limitations, like shade, and tree cover. Also, aerial images may be taken at wrong time (for a while the image of Market/Castro was during festival).

Brandon Martin-Anderson: What strategies have you tried so far?
-aerial images
-Flickr/Picasa location
-Street View face blur (a lot false positives)
Anything you plot looks kind of the same.

People like to walk where other people are. For safety reasons. -Good point on real-time basis.
Eric: Not a lobbying group for peds.
Eric: Find interesting places to go.
Richard: We need exposure data.

Paris bike sharing report showed that “Cycling is faster on Wednesdays.”
Europeans more open to sharing their private details – possibly because of stricter regulation on what agencies can do with the collected data. (There was a little disagreement on this, I personally heard the opposite).

Andrew: Can we use something like Xbox Kinect to track these people?

National Bike/Ped Documentation Project – same format
Seattle – 4 different groups that do annual bike counts. UW bike planning studio.

Who pays for this?
-Transportation planners pay for this.
-Private development projects (from contractor).
-Universities, NSF, Google
-Community groups –

Further reading


Mike Fleisher – DS Solutions
Andrew – @ondrae – urbanmapping.com

Notes to self

Is Census question about commuting about time or distance of “most traveled” mode?
Splunk – data analysis tool
What is difference <> SFCTA/SFMTA?

Bike crash map in the press

Thank you to the Bay Citizen, Gapers Block, and the Chicago Bicycle Advocate (lawyer Brendan Kevenides). They’ve all written about the bike crash map I produced using Google Fusion Tables. And WGN 720 AM interviewed me and aired it in April 2011.

View the map now. The map needs to be updated with injury severity, a field I mistakenly removed before uploading the data.

The Bay Citizen started this by creating their own map of bike crashes for San Francisco, albeit with more information. I had helped some UIC students obtain the data from the Illinois Department of Transportation for their GIS project and have a copy of it myself. I quickly edited it using uDig and threw it up online in an instant map created by Fusion Tables.

A guy rides his bicycle on the “hipster highway” (aka Milwaukee Avenue), the street with the most crashes, but also has the most people biking (in mode share and pure quantity).

Why did I make the map?

I made this project for two reasons: One is to continue practicing my GIS skills and to learn new software and new web applications. The second reason was to put the data out there. There’s a growing trend for governments to open up their databases, and your readers have probably seen DataSF.org’s App Showcase. But in Chicago, we’re not seeing this trend. Instead of data, we get a list of FOIA requests, or instead of searchable City Council meeting minutes, we get PDFs that link to other PDFs that you must first select from drop down boxes. But both of these are improvements from before.

I would love to help anyone else passionate about bicycling in Chicago to find ways to use this data or project to address problems. I think bicycling in Chicago is good for many people, but we can make it better and for more people.

Read the full interview.

Trying out uDig, a free, multi-platform GIS application

ArcGIS is the standard in geographic information system applications. I don’t like that it’s expensive, unwieldy to install and update, and its user interface is stymying and slow*. I also use Mac OS X most of the time and ArcGIS is not available for Mac. It doesn’t have to be the standard.

I’ve tried my hand at Cartographica and QGIS. I really like QGIS because there’re many plugins, it’s open source, there’s a diverse community supporting it, and best of all, it’s free. I’ve written about Cartographica once – I’m not a fan right now.

My project

  • The data: Bicycle crashes in the City of Chicago as reported to IDOT for 2007-2009
  • Goal: Publish an interactive map of this data using Google Fusion Tables and its instant mapping feature.
  • Visualizing it: Added streets (prepared beforehand to exclude highways), water features, and city boundary (get that here)
  • Process: Combine bike crash data; reproject to WGS84 for Google; remove extraneous information; add latitude/longitude coordinates; export as CSV; upload to Google Fusion Tables; map it!
  • View the final product

Trying out uDig

In reaching my goal I had a task that I couldn’t figure out how to complete with QGIS: I needed to combine three shapefiles with identical table schemes into one shapefile – this one shapefile would eventually be published as one map. The join feature in fTools wasn’t working so I looked for a new solution, uDig, or “User-friendly Desktop Internet GIS.”

The solution was very easy. Highlight all the records in the attribute table of one shapefile, click Edit>Copy, then select the destination table and click Edit>Paste. The new records were added within a couple seconds. I could then bring this data back into QGIS to finish the process (outlined above under Project). I did use fTools later in the process to add lat/long coordinates to my single shapefile.

After adding more data to better visualize the crashes in Chicago, I noticed that uDig renders maps to look smoother and slightly prettier than QGIS or ArcGIS. See the screenshot below.

A screenshot of the three bicycle crash datasets (2007, 2008, 2009) with the visualization data added.

The end product: three years of police reported bicycle crashes in the City of Chicago on an interactive map powered by Google Fusion Tables, another product in Google’s arsenal of GIS for the poor man. View the final product.

*I haven’t used ArcGIS version 10 yet, which I see and read has an improved user interface; it’s unclear to me and other users if the program’s been updated to take advantage of multi-core processors. ESRI has a roundabout way of describing their support.

Best ways to present bicycle crash data

I started some preliminary work on my crash reporting tool. I haven’t written any code, but I’ve been working on the logistics of analyzing and presenting the data to the public.

I obtained bicycle crash data for 2009 from the Illinois Department of Transportation’s Division of Traffic Safety. I’m not able to distribute raw data (you’ll have to ask for it yourself) and Illinois statutes prevent me from distributing personally identifying data (but it’s really hard to know what this is). In the meantime, based on Ben Sheldon’s suggestion, I loaded some of the data into a private Google Fusion Table that instantly maps geocoded data (it can also geocode the data for you).

Richard cautions me about way I choose to present data. I need to choose terms and descriptions carefully to avoid misinterpretations. Pete from the Boston Cyclist’s Union recommends against accepting self-reported data. I’ll be taking their advice into consideration as I move forward.

You see in the map (top) that a lot of crashes happen on Milwaukee Avenue (above). That’s where a lot of people ride (over 3,000 in 24 hours in the fall).

I have not begun to review the narrative details in the crash reports. Actually, they’re not very narrative because they’re fixed responses – no free writing allowed. And not every record represents a collision (meaning a crash with at least two parties). Many are self-crashes (is that a legit phrase)?

I’m not sure exactly what story I want the data to tell so it will probably be a while before I make anything public. One of my favorite geographic information books, Making Maps, talks about the endless ways maps can be designed and portrayed and that each tells a different story. It’s best if I know the story (a goal) ahead of time.