Category: Tutorial

How to upload shapefiles to Google Fusion Tables

It is now possible to upload a shapefile (and its companion files SHX, PRJ, and DBF) to Google Fusion Tables (GFT).

Before we go any further, keep in mind that the application that does this will only process 100,000 rows. Additionally, GFT only gives each user 200 MB of storage (and they don’t tell you your current status, that I can see).

  1. Login to your Google account (at Gmail, or at GFT).
  2. Prepare your data. Ensure it has fewer than 100,000 rows.
  3. ZIP up your dataX.shp, dataX.shx, dataX.prj, and dataX.dbf. Use WinZip for Windows, or for Mac, right-click the selection of files and select “Compress 4 items”.
  4. Visit the Shape to Fusion website. You will have to authorize the web application to “grant access” to your GFT tables. It needs this access so that after the web application processes your data, it can insert it into GFT.
  5. If you want a Centroid Geometry column or a Simplified Geometry column added, click “Advanced Options” and check their checkboxes – see notes below for an explanation.
  6. Choose the file to upload and click Upload.
  7. Leave the window open until it says it has processed all of the rows. It will report “Processed Y rows and inserted Y rows”. You will be given a link to the GFT the web application created.

Sample Data

If you’re looking to give this a try and see results quickly, try some sample data from the City of Chicago data portal:

Notes

I had trouble many times while using Shape to Fusion in that after I chose the file to upload and clicked Upload, I had to grant access to the web application again and start over (choose the file and click Upload a second time).

Centroid Geometry – This creates a column with the geographic coordinates of the centroid in a polygon. It lists it in the original projection system. So if your projection is in feet, the value will be in feet. This is a function that can easily be performed in free and open source QGIS, where you can also reproject files to get latitude and longitude values (in WGS84 project, EPSG 4326). The centroid value is surrounded in the field by KML syntax “<Point><coordinates>X,Y</coordinates></Point>”.

Simplified Geometry – A geometry column is automatically created by the web application (or GFT, I’m not sure). This function will create a simpler version of that geometry, with fewer lines and vertices. It also creates columns to list the vertices count for the simple and regular geometry columns.

Introduction to DIY bike ridership research

A lot of people ask me how many people are out there bicycling.

“Not a lot”, I tell them.

And I explain why: the primary source of data is the American Community Survey, which is a questionnaire that asks people questions about how they got to work in a specific week. (More details on how it does this below.) We don’t have data, except in rare “Household Travel Surveys”, about trips by bike to school, shopping, and social activities.

It’s comparable across the country – you can get this data for any city.

Here’s how:

  1. Visit the “legacy” American FactFinder and select American Community Survey, operated by the United States Census Bureau.
  2. Select 2005-2009 American Community Survey 5-Year Estimates (or the latest 5-year estimate). This is the most accurate data.
  3. In the right-side menu that appears, click on “Enter a table number”.
  4. In the new window, input the table number ” S0801″ (“Commuting Characteristics by Sex”) and submit the form. The new window will close and the other window will go to that table.
  5. Now it’s time to select your geography. In the left-side menu, under “Change…” click on “geography (state, county, place…)”
  6. In the window to change your geography, select “Place” as your “Geographic Type”.
  7. Then select the state.
  8. Then select your city and click “Show Result”.
Notes:
  • This data shows all modes people take to work, who live in that city. It’s highly probable that people are leaving the city to their jobs on these modes. For example, someone who lives in Rogers Park may ride their bike to work in Evanston.
  • The URL is a permanent link to this dataset. Each city has a unique URL. You should save these as bookmarks so you can easily reference the data later.
  • The question on the survey doesn’t allow multiple choices: “People who used more than one means of transportation to get to work each day were asked to report the one used for the longest distance during the work trip”.

Using Google Fusion Tables to create individual Chicago Ward maps

I wanted to create a map of the 35th Ward boundaries using Google My Maps for a story on Grid Chicago. I planned to create this by taking the Chicago Wards boundary shapefile and exporting just the 35th Ward using QGIS into a KML file. I ran into many problems and ended up using Google Fusion Tables as the final solution.

The problems

First, QGIS creates invalid KML files. Google Earth will tell you this. I opened the KML file in a text editor and removed the offending parts (Google Earth mildly tells you what these are; you can use this validator to get more information).

Second, Google My Maps would not import the KML file. I tried a different browser and a different KML file; a friend ran into the same issue. I reported this problem to Google.

The solution

I uploaded to Google Fusion Tables a KML file containing all wards. I did this instead of uploading the single Ward because, like a database, I can filter values in the column, selecting only the row I want with “ward=35”.

After applying the filter, the map will show the boundary for just that ward. I grab the HTML code for an embeddable map and voila, the article now displays an interactive map of the 35th Ward.

Whenever I want to create a map for a different ward, I go back to this Fusion Table, make a new filter and copy the new HTML code.

A screenshot of the embedded map, showing just 1 of 50 wards, in the Grid Chicago article. 

Elsewhere

I had the same problems with QGIS exporting and uploading the KML files to My Maps the other day when I was creating maps for the abandoned railroads for Monday’s Grid Chicago article. Not thinking about Fusion Tables, I drew on the map with my mouse the lines.

Screenshot of the map of abandoned railroads. 

Just how many taxi vs. bicycle crashes are there? A Google Refine story

On this Chicagoist story about how IDOT will now collect data on doorings (instead of ignoring that crash type as they preferred), I opened the story photo entitled “Cabbie takes down another” by Moe Martinez. His photo caption reads, “you see it alot … thankfully this guy seemed to be relatively ok … coherent and what not.”

I wanted to know just how often “we” see taxi drivers crashing with people riding bicycles. You can’t filter by vehicle type in either mine or Derek’s bike crash maps, but you can via the Fusion Table.

I decided to get the answer via Google Refine and make a screencast to show you just how quick and powerful a tool it is.

It’s dead simple:

  1. Load a CSV of the data into Google Refine.
  2. Click on the VEH1_SPECL column’s down arrow, then Facet>Text Facet.
  3. In the facet box, sorted alphabetically, find “TAXI/FORE HIRE.”
  4. The number of rows that apply is listed: 353.
  5. Divide 353 by the total number of rows, 4931, multiply by 100, and you get your percentage.

Taxi drivers are involved in just 7.2% of bicycle crashes in Chicago in 2007-2009.

The majority of crashes, at 66%, involve people driving “PERSONAL” vehicles. And 80% of those crashes are with a passenger vehicle that’s not a van, minivan, SUV, truck, or bus (so probably a sedan or coupe). Let’s look at more data.

How many taxis are there and how many personal vehicles are there? Are taxicabs involved in a disproportionately higher number of crashes?

About 781,023 people drive to work, either alone or with someone else, in Chicago (data from 2005-2009 5-year American Community Survey). 1,063,047 households have 1,218,594+ vehicles available in Chicago. Let’s assume the 7,000 taxicabs in Chicago are not counted as a “vehicle available.”* That’s 1,225,594 “personal” vehicles. If all were on the road at the same time, only 0.57% of them would be taxicabs. But they’re not on the road at the same time. So let’s take that number of people who drive to work and add 7,000 vehicles to it. So of those 788,023 “vehicles” now on the road, just 0.88% of them are taxicabs.

So it does seem that taxicabs are involved in a disproportionate number of crashes when compared to their presence on the streets. However, taxicabs are most likely driven more more miles and for more time than personal vehicles thus making their exposure to people bicycling greater than drivers of other vehicles. (A majority of “personal” trips are very short.)

New data coming soon

I can’t wait to get the 2010 crash data. Here’s why: In 2007, students in a taxi driver training course at Harold Washington College received some education about sharing the road with bicyclists:

A pilot “Share The Road” education module was launched at the taxi training school at Harold Washington College. It includes a 25-30 minute lecture, with discussion. After the pilot, the class will be required for all people training to drive taxis in Chicago. In the future, bicycle questions will be included on the exams required to become taxi drivers. June 2007 MBAC meeting minutes (PDF).

The number of crashes between taxi drivers and people riding bikes jumped from 2007 to 2008, but declined heavily between 2008 and 2009. More data will show us a clearer trend that may lend insight into the impact of the “Share The Road” education module.

*Notes

The question (PDF) on the American Community Survey asks, “How many automobiles, vans, and trucks of one-ton capacity or less are kept at home for use by members of this household?” This may or may not include taxicabs stored at home.

I don’t know how many taxicabs there are in Chicago, but the Chicago Sun-Times reported there are approximately 7,000.

How to create a map in GeoCommons

GeoCommons (GC) is like Google My Maps but more powerful. Read my introduction to GC.

Tips before starting

  • With GC, I’m still figuring out what I must decide before I choose to add or amend something and what I can edit after I’ve made a change.
  • You cannot edit the data table directly.
  • You CAN replace data – click “reupload” – but the columns must match between original and replacement data.
  • Click Save often when making the map. You never know when Adobe Flash is going to quit on you.

One of the busiest locations in Chicago, for people walking, or riding buses and trains. Also a lot of taxi traffic and medium bike traffic. At Adams Street and Riverside Plaza (er, the Chicago River).

Tutorial

  1. Prepare your data.”We support Spreadsheets (as CSVs), Shapefiles, KML, RSS, ATOM and GeoRSS. We also support WMS and Tile services!” GeoCommons has instructions on how to prepare your spreadsheets for geocoding (if not already geocoded; GC will also work with predefined XY coordinates or street addresses). Ensure fields holding numbers have their type set as numeric in the GIS or spreadsheet program or you may run into roadblocks later on when trying to analyze these fields.
  2. If uploading a shapefile, GC requires the SHX and DBF files as well. The PRJ file will also help GC know how to reproject your data on the fly. GC base layer maps are projected in WGS84, just like Google Maps. Without the PRJ file, your data may not show. [Can the user set projection?]
  3. Upload data.
  4. You need to turn your newly uploaded data from a “pending dataset” to a completed dataset. In this process you will tell GC a little more about your data, including which columns hold the XY coordinates (even though it guesses this). you can also change the attribute names and describe the content of those attributes (you can also change this later).
  5. So click “Next Step” to start this process.
  6. In the “Review Your Geodata” step, you may see that GC has found some additional columns in your dataset. I’m not sure why this is. Delete these columns by selecting the header and clicking Delete Column. Then click Save Changes. You can select multiple columns at a time by holding the Command (Mac) or Control (Windows) keys.
  7. Add metadata; edit attribute names and add descriptions.
  8. You’re done. GC will present you a page with statistics and options to download your data in different formats.
  9. If you want to make a map with more data, follow the process again starting at Step 1. If not, continue.
  10. Make a map! Click “Map Data” or the “Make a Map” button in toolbar.
  11. A map of the world will load. When GC has finished loading your “new layer,” the map will zoom in.
  12. For the pedestrian map, I want to symbolize the data with a single color but changing the size of the circle based how many people were counted there (your data must have this attribute in numeric form – if it doesn’t you may have to reupload your data). Click “Add Data” and then in the Map Brewer box that appears by:
    1. Click on Visual Theme. Click next.
    2. Select the NUMERIC attribute. In the pedestrian data, this is “count.”
    3. Then select whether or not you want colors or sizes. You can not change this later. You would just delete the layer and add the layer again (using your already uploaded dataset).
    4. Select what type of classification you want. This is entirely up to you and how you want the map to look and based on what data you have. You can change this later.
    5. Choose your shape and color.
  13. Add more data by clicking Add Data button. I think my map would be more useful and interesting if it also showed where the train stations are, a major destination category for people who walk downtown on weekdays. I will symbolize by a solid color. Instead of visual theme, which I chose for the ped counts, I will just choose Points, Lines & Areas. At this time, GC doesn’t allow custom icons.
  14. Re-order layers by dragging them up and down in the layers box. Click on the boxy “handle” to the left of the layer.
  15. Change the layer names by single clicking on the layer name. Press Enter when you’re done.
  16. Change the map name by singe click on it. Press Enter when you’re done.

After creating my pedestrian map, I had some suggestions for GeoCommons, the people who collected the pedestrian count data, and my own map.

  • GeoCommons should add a map preview image for better sharing on Facebook and other websites that look for this.
  • GeoCommons should allow maps to be private after creation – I think after you click save, they are added to a gallery (I could be wrong).
  • The data collectors should add more locations, particularly around Union Station and the two Clinton CTA stations (also between CTA and Metra stations).
  • The data collectors should add “date collected” to the data table
  • The data collectors should extend survey hours to better match commuting patterns. A majority of the collections end at 5:45 PM while Metra’s rush hour ends just before 7 PM (this is when train departure frequency drops).
  • I should add ridership data to the train stations so we can see which CTA and Metra stations are most used.