Best ways to present bicycle crash data

I started some preliminary work on my crash reporting tool. I haven’t written any code, but I’ve been working on the logistics of analyzing and presenting the data to the public.

I obtained bicycle crash data for 2009 from the Illinois Department of Transportation’s Division of Traffic Safety. I’m not able to distribute raw data (you’ll have to ask for it yourself) and Illinois statutes prevent me from distributing personally identifying data (but it’s really hard to know what this is). In the meantime, based on Ben Sheldon’s suggestion, I loaded some of the data into a private Google Fusion Table that instantly maps geocoded data (it can also geocode the data for you).


Richard cautions me about way I choose to present data. I need to choose terms and descriptions carefully to avoid misinterpretations. Pete from the Boston Cyclist’s Union recommends against accepting self-reported data. I’ll be taking their advice into consideration as I move forward.


You see in the map (top) that a lot of crashes happen on Milwaukee Avenue (above). That’s where a lot of people ride (over 3,000 in 24 hours in the fall).

I have not begun to review the narrative details in the crash reports. Actually, they’re not very narrative because they’re fixed responses – no free writing allowed. And not every record represents a collision (meaning a crash with at least two parties). Many are self-crashes (is that a legit phrase)?

I’m not sure exactly what story I want the data to tell so it will probably be a while before I make anything public. One of my favorite geographic information books, Making Maps, talks about the endless ways maps can be designed and portrayed and that each tells a different story. It’s best if I know the story (a goal) ahead of time.

flattr this!

About Steven Vance

Enthusiast for urbanism, bicycling as transportation, and open data. Building a bicycle culture in Chicago.
  • James Wong

    Mapping seems like a great first step but when you’re talking about what story to have the data tell, I think it will be interesting to see if you choose to look at a network level or at a design level. For sure, geocoding data is useful for identifying corridors or intersections as high crash locations, but at the same time the best data would say “the bicyclist was injured in a NBRT hook at intersection whatever”. This kind of information is where data can really inform design/spot treatments to improve bike safety. Sadly, having seen my fair share of crash datasets and even getting to the crash report level, I am reasonably confident this refined level of data doesn’t exist. It’s a shame but police reporting is still just a checkbox for a “bicycle” crash. Will keep an eye here to see what you come up with. Good luck. -jcw

    • Steven Vance

      What does NBRT mean?

      So right hooks and left hooks are illegal under City of Chicago municipal code. But to record that kind of action, I believe it has to be written in a narrative.

      The narrative is NOT captured in the data I have from Illinois Department of Transportation, Division of Traffic Safety. The datasets of the Chicago Police and IDOT are incongruent and the Chicago Police data is scrubbed to fit inside the IDOT dataset. I asked the Chicago Police for a dataset and they said it wasn’t available. I think that means that they weren’t going to filter their crash data for “pedalcycles” as the letter said they’re not required to create new datasets.

  • Tony

    Hi Steven,

    So what do you think is the best way to display crash data?

    I agree with James Wong’s comments that looking at collisions as points on a map is a good place to start and then determining if there’s a network design issue. But before you even get to that, you’d need to have a look at the frequency at each intersection or along each corridor so as to exclude isolated incidents from largely contributing to the story.

    I work as GIS Specialist for a bike/ped planning firm and work with CHP’s collision databases all the time. The data is pretty easily geocoded but I can only display points at intersections. There are fields in the database that says how far from the intersection the collision actually was but these are largely rough estimates. It would be great to have the lat/long that I’ve seen in other jurisdictions. Anyway, I’ve resolved to join collision frequencies to the street network layer within 300 – 500′ (or some logical threshold) of the intersection since the location I am working with is largely an estimate and I can still pinpoint hot spots.

    Have you had any recent developments since this post?

    Good luck!