Tagtutorial

Converting shapefiles to GeoJSON, and other format conversions

To develop the Chicago Bike Map app, I had a problem I thought would be simple to solve: load train lines into a Leaflet-powered map. I had the train lines stored as a polyline shapefile but Leaflet can only read the GeoJSON format or a string of geographic coordinates representing lines.

I eventually found a solution (I can’t remember how) and I need to share it with you. The converter can do more than ESRI shapefiles to GeoJSON. It can reproject the data in the conversion. It can convert from several formats to several other formats.

The site is called MyGeodata Converter. You upload a ZIP file of geographic files – .shp and its companion files (.prj, .dbf, .shx), .kml, and .gpx. Let’s take the Chicago Transit Authority train lines shapefile straight from the City of Chicago’s open data portal. It downloads as a zipped collection of a shapefile and its buddies and we can take this file straight to the Converter and upload it. The Converter will unzip it and read the data; it will even identify the projection system (for Chicago-based geographic data, its common to use NAD83 Illinois StatePlane East FIPS 1201 Feet (SRID 102671, the same as SRID 3435).

The Converter will convert to one of the following formats, with same or new projection; accepts SQL statements to extract a subset of data:

  • ESRI shapefile
  • GML
  • KML, KMZ
  • GeoJSON
  • Microstation DGN
  • MapInfo File
  • GPX
  • CSV

New map tutorials

A screenshot of using BatchGeocode to take a spreadsheet of addresses and turn it into a nice map. 

Over at Grid Chicago, my other blog that sucks all the time from this blog, I’ve recently written tutorials on how to create online maps, first with Google My Maps (which they renamed to My places) and secondly with BatchGeocode (which renamed itself to BatchGeo because it does more than geocoding now).

Google My Maps is primitive as far as map making goes, but it has the lowest learning curve and it’s easy: you just click on the map where you want something to go and fill in the info window. Read that tutorial.

BatchGeocode is slightly more advanced, but takes your tabular data (most likely from a spreadsheet) and throws it on a map you can embed on your website. They do have pay features. Read the tutorial for BatchGeocode.

I’ve written about BatchGeocode for QGIS, as it was once the only way to do geocoding in QGIS. But now BatchGeocode doesn’t give you a results table that has the latitude and longitude (apparently this is against Google Maps’s terms of service). But I updated the article to talk about using other methods for geocoding in QGIS.

I will be writing two more tutorials, one about GeoCommons and one about Google Fusion Tables.

How to upload shapefiles to Google Fusion Tables

It is now possible to upload a shapefile (and its companion files SHX, PRJ, and DBF) to Google Fusion Tables (GFT).

Before we go any further, keep in mind that the application that does this will only process 100,000 rows. Additionally, GFT only gives each user 200 MB of storage (and they don’t tell you your current status, that I can see).

  1. Login to your Google account (at Gmail, or at GFT).
  2. Prepare your data. Ensure it has fewer than 100,000 rows.
  3. ZIP up your dataX.shp, dataX.shx, dataX.prj, and dataX.dbf. Use WinZip for Windows, or for Mac, right-click the selection of files and select “Compress 4 items”.
  4. Visit the Shape to Fusion website. You will have to authorize the web application to “grant access” to your GFT tables. It needs this access so that after the web application processes your data, it can insert it into GFT.
  5. If you want a Centroid Geometry column or a Simplified Geometry column added, click “Advanced Options” and check their checkboxes – see notes below for an explanation.
  6. Choose the file to upload and click Upload.
  7. Leave the window open until it says it has processed all of the rows. It will report “Processed Y rows and inserted Y rows”. You will be given a link to the GFT the web application created.

Sample Data

If you’re looking to give this a try and see results quickly, try some sample data from the City of Chicago data portal:

Notes

I had trouble many times while using Shape to Fusion in that after I chose the file to upload and clicked Upload, I had to grant access to the web application again and start over (choose the file and click Upload a second time).

Centroid Geometry – This creates a column with the geographic coordinates of the centroid in a polygon. It lists it in the original projection system. So if your projection is in feet, the value will be in feet. This is a function that can easily be performed in free and open source QGIS, where you can also reproject files to get latitude and longitude values (in WGS84 project, EPSG 4326). The centroid value is surrounded in the field by KML syntax “<Point><coordinates>X,Y</coordinates></Point>”.

Simplified Geometry – A geometry column is automatically created by the web application (or GFT, I’m not sure). This function will create a simpler version of that geometry, with fewer lines and vertices. It also creates columns to list the vertices count for the simple and regular geometry columns.

Just how many taxi vs. bicycle crashes are there? A Google Refine story

On this Chicagoist story about how IDOT will now collect data on doorings (instead of ignoring that crash type as they preferred), I opened the story photo entitled “Cabbie takes down another” by Moe Martinez. His photo caption reads, “you see it alot … thankfully this guy seemed to be relatively ok … coherent and what not.”

I wanted to know just how often “we” see taxi drivers crashing with people riding bicycles. You can’t filter by vehicle type in either mine or Derek’s bike crash maps, but you can via the Fusion Table.

I decided to get the answer via Google Refine and make a screencast to show you just how quick and powerful a tool it is.

It’s dead simple:

  1. Load a CSV of the data into Google Refine.
  2. Click on the VEH1_SPECL column’s down arrow, then Facet>Text Facet.
  3. In the facet box, sorted alphabetically, find “TAXI/FORE HIRE.”
  4. The number of rows that apply is listed: 353.
  5. Divide 353 by the total number of rows, 4931, multiply by 100, and you get your percentage.

Taxi drivers are involved in just 7.2% of bicycle crashes in Chicago in 2007-2009.

The majority of crashes, at 66%, involve people driving “PERSONAL” vehicles. And 80% of those crashes are with a passenger vehicle that’s not a van, minivan, SUV, truck, or bus (so probably a sedan or coupe). Let’s look at more data.

How many taxis are there and how many personal vehicles are there? Are taxicabs involved in a disproportionately higher number of crashes?

About 781,023 people drive to work, either alone or with someone else, in Chicago (data from 2005-2009 5-year American Community Survey). 1,063,047 households have 1,218,594+ vehicles available in Chicago. Let’s assume the 7,000 taxicabs in Chicago are not counted as a “vehicle available.”* That’s 1,225,594 “personal” vehicles. If all were on the road at the same time, only 0.57% of them would be taxicabs. But they’re not on the road at the same time. So let’s take that number of people who drive to work and add 7,000 vehicles to it. So of those 788,023 “vehicles” now on the road, just 0.88% of them are taxicabs.

So it does seem that taxicabs are involved in a disproportionate number of crashes when compared to their presence on the streets. However, taxicabs are most likely driven more more miles and for more time than personal vehicles thus making their exposure to people bicycling greater than drivers of other vehicles. (A majority of “personal” trips are very short.)

New data coming soon

I can’t wait to get the 2010 crash data. Here’s why: In 2007, students in a taxi driver training course at Harold Washington College received some education about sharing the road with bicyclists:

A pilot “Share The Road” education module was launched at the taxi training school at Harold Washington College. It includes a 25-30 minute lecture, with discussion. After the pilot, the class will be required for all people training to drive taxis in Chicago. In the future, bicycle questions will be included on the exams required to become taxi drivers. June 2007 MBAC meeting minutes (PDF).

The number of crashes between taxi drivers and people riding bikes jumped from 2007 to 2008, but declined heavily between 2008 and 2009. More data will show us a clearer trend that may lend insight into the impact of the “Share The Road” education module.

*Notes

The question (PDF) on the American Community Survey asks, “How many automobiles, vans, and trucks of one-ton capacity or less are kept at home for use by members of this household?” This may or may not include taxicabs stored at home.

I don’t know how many taxicabs there are in Chicago, but the Chicago Sun-Times reported there are approximately 7,000.

How to create a map in GeoCommons

GeoCommons (GC) is like Google My Maps but more powerful. Read my introduction to GC.

Tips before starting

  • With GC, I’m still figuring out what I must decide before I choose to add or amend something and what I can edit after I’ve made a change.
  • You cannot edit the data table directly.
  • You CAN replace data – click “reupload” – but the columns must match between original and replacement data.
  • Click Save often when making the map. You never know when Adobe Flash is going to quit on you.

One of the busiest locations in Chicago, for people walking, or riding buses and trains. Also a lot of taxi traffic and medium bike traffic. At Adams Street and Riverside Plaza (er, the Chicago River).

Tutorial

  1. Prepare your data.”We support Spreadsheets (as CSVs), Shapefiles, KML, RSS, ATOM and GeoRSS. We also support WMS and Tile services!” GeoCommons has instructions on how to prepare your spreadsheets for geocoding (if not already geocoded; GC will also work with predefined XY coordinates or street addresses). Ensure fields holding numbers have their type set as numeric in the GIS or spreadsheet program or you may run into roadblocks later on when trying to analyze these fields.
  2. If uploading a shapefile, GC requires the SHX and DBF files as well. The PRJ file will also help GC know how to reproject your data on the fly. GC base layer maps are projected in WGS84, just like Google Maps. Without the PRJ file, your data may not show. [Can the user set projection?]
  3. Upload data.
  4. You need to turn your newly uploaded data from a “pending dataset” to a completed dataset. In this process you will tell GC a little more about your data, including which columns hold the XY coordinates (even though it guesses this). you can also change the attribute names and describe the content of those attributes (you can also change this later).
  5. So click “Next Step” to start this process.
  6. In the “Review Your Geodata” step, you may see that GC has found some additional columns in your dataset. I’m not sure why this is. Delete these columns by selecting the header and clicking Delete Column. Then click Save Changes. You can select multiple columns at a time by holding the Command (Mac) or Control (Windows) keys.
  7. Add metadata; edit attribute names and add descriptions.
  8. You’re done. GC will present you a page with statistics and options to download your data in different formats.
  9. If you want to make a map with more data, follow the process again starting at Step 1. If not, continue.
  10. Make a map! Click “Map Data” or the “Make a Map” button in toolbar.
  11. A map of the world will load. When GC has finished loading your “new layer,” the map will zoom in.
  12. For the pedestrian map, I want to symbolize the data with a single color but changing the size of the circle based how many people were counted there (your data must have this attribute in numeric form – if it doesn’t you may have to reupload your data). Click “Add Data” and then in the Map Brewer box that appears by:
    1. Click on Visual Theme. Click next.
    2. Select the NUMERIC attribute. In the pedestrian data, this is “count.”
    3. Then select whether or not you want colors or sizes. You can not change this later. You would just delete the layer and add the layer again (using your already uploaded dataset).
    4. Select what type of classification you want. This is entirely up to you and how you want the map to look and based on what data you have. You can change this later.
    5. Choose your shape and color.
  13. Add more data by clicking Add Data button. I think my map would be more useful and interesting if it also showed where the train stations are, a major destination category for people who walk downtown on weekdays. I will symbolize by a solid color. Instead of visual theme, which I chose for the ped counts, I will just choose Points, Lines & Areas. At this time, GC doesn’t allow custom icons.
  14. Re-order layers by dragging them up and down in the layers box. Click on the boxy “handle” to the left of the layer.
  15. Change the layer names by single clicking on the layer name. Press Enter when you’re done.
  16. Change the map name by singe click on it. Press Enter when you’re done.

After creating my pedestrian map, I had some suggestions for GeoCommons, the people who collected the pedestrian count data, and my own map.

  • GeoCommons should add a map preview image for better sharing on Facebook and other websites that look for this.
  • GeoCommons should allow maps to be private after creation – I think after you click save, they are added to a gallery (I could be wrong).
  • The data collectors should add more locations, particularly around Union Station and the two Clinton CTA stations (also between CTA and Metra stations).
  • The data collectors should add “date collected” to the data table
  • The data collectors should extend survey hours to better match commuting patterns. A majority of the collections end at 5:45 PM while Metra’s rush hour ends just before 7 PM (this is when train departure frequency drops).
  • I should add ridership data to the train stations so we can see which CTA and Metra stations are most used.

Converting Google My Maps to KML and GPX

Convert your routes that you made in Google My Maps to GPX so that you can view them on Garmin GPS devices, or upload them to MapMyRide.

  1. Access your My Map. Your My Map must have lines or routes in it. It appears that a My Map with only points doesn’t convert correctly.
  2. Click on View in Google Earth. Your web browser will download a KML file. It may automatically open in Google Earth, but this is not necessary.
  3. Visit GPS Visualizer to convert your KML file to GPX
  4. Select GPX as your output.
  5. For the input, choose the KML file you just downloaded from Google My Maps.
  6. Click Convert. Your file will be uploaded and your GPX file will be presented for download on the next page.
  7. Download your GPX file from the link on the page.

You can now transfer the GPX file to your GPS device, or upload it to MapMyRide. I confirmed that MapMyRide successfully imports the Google My Map I converted following these instructions.

How to convert GTFS to GIS shapefiles and KML

This tutorial will teach how you to convert any transit agency’s General Transit Feed Specification (GTFS) data into ESRI ArcGIS-compatible shapefiles (.shp), KML, or XML. This is simple to do because GTFS data is essentially a collection of CSV (comma separated values) text files (really, really large text files).

Note: I don’t know how to do the reverse, converting shapefiles or other geodata into GTFS data. I’m not sure if this is possible and I’m still investigating it. If you have tips, let me know.

Converting GTFS to GIS shapefiles

Instructions require the use of ArcGIS (Windows only) and a free plugin called ET GeoWizards GIS for any version of ArcGIS. I do not have instructions for Mac users at this time.

I wrote these instructions while converting the Chicago Transit Authority’s GTFS files into shapefiles based on a reader’s request. “Field names” are quoted and layer names are italicized.

  1. Download the GTFS data you want. Find data from agencies around the world (although not many from Europe) on GTFS Data Exchange.
  2. Import into ArcGIS the shapes.txt file using Tools>Add XY Data. Specify Y=lat and X=lon
  3. Using ET GeoWizards GIS tools, in the Convert tab, convert the points shapefile to polyline.
  4. Select the shapes layer in the wizard, then create a destination file. Click Next.
  5. Select the “shape_id” field
  6. Click the checkbox next to Order and select the field “shape_pt_sequence” and click Finish.
  7. Depending on the number of records (the CTA has 466,000 shapes), it may take a while.
  8. The new shapefile will be added to your Table of Contents and appear in your map.
  9. Import the trips.txt and routes.txt files. Inspect them for any NULL values in the “route_id” field. You will be using this field to join the routes and trips table. It may be a case that ArcGIS imported them incorrectly; the text files will show the correct data. If NULL values appear, follow steps 10 and 11 and continue. If not, follow steps 10 and 12 and continue. This happens because ArcGIS inspected some of the data and determined they were integers and ignored text. However, this is not the case.
  10. Export the text files as DBF files so that ArcGIS operates on them better. Then remove the text files from the Table of Contents.
  11. (Only if NULL values appear) Go into editing mode and fix the NULL values you noticed in step 9. You may have to make a new column with a more forgiving data type (string) and then copy the “route_id” column into the new column. Then continue to step 12.
  12. Join routes and trips based on the field “route_id” – export as trips_routes.dbf
  13. Add a new column to shapes.shp called “shape_id2″, with data type double 18, 11. This is so we can perform step 14. Use the field calculator to copy the values from “shape_id” (also known as ET_ID) to “shape_id2″
  14. Join routes_trips with shapes into routes_poly based on the field “shape_id” (and “shape_id2″)
  15. Dissolve routes_poly on “route_id.” Make sure all selections are cleared. Use statistics/summary fields: “route_long,” “route_url.” Save as routes_diss.shp
  16. Inspect the new shapefile to ensure it was created correctly. You may notice that some bus routes don’t have names. Since these routes are well documented on the CTA website, I’m not going to fill in their names.

Click on the screenshot to see various steps in the tutorials.

Converting GTFS to KML

After you have it in shapefile form, converting to KML is easy – follow these instructions for using QGIS. Or if you want to skip the shapefile-creation process (quite involved!), you can use KMLWriter, a Python script. Also, I think the latest version of ArcGIS has built-in KML exporting.

Converting GTFS to XML

If you want to convert the GTFS data (which are essentially comma-separated value – CSV – files) to XML, that’s easier and you can avoid using GIS programs.

  • First try Mr. Data Converter (very user friendly).
  • If that doesn’t work, try this website form on Creativyst. I tested it by converting the CTA’s smallest GTFS table, frequencies.txt, and it worked properly. However, it has a data size limit. (User friendly.)
  • Next try csv2xml, a command line tool. (Not user friendly.)
  • You can also use Microsoft Excel, but read these tips and caveats first. (I haven’t found a Microsoft application I like or think is user friendly.)

How to geocode a single address in QGIS

Since the last time I wrote about how to use BatchGeocode.com to perform pseudo-geocoding tasks in QGIS, there have been considerable improvements in the multi-platform, free, and open source GIS software. Now, geocoding (turning addresses into coordinates) is more automatic, albeit difficult to setup. (Okay, this has been around June 2009 and I just found out about it in October 2010.)

Once you install all the components, you’ll never have to do this again.

This method can only geocode one address at a time, but it will geocode all of the addresses into a single shapefile.

  1. Download QGIS.
  2. Download and install Python SetupTools. This includes the easy_install function that will download a necessary Python script, simplejson. On Mac you will have to use the Terminal (Applications>Utilities). Email me if you run into problems.
  3. Install simplejson. In the command line (Terminal for Mac; in Windows press Start>Run>”cmd”>Enter), type “easy_install simplejson”.
  4. Download the GeoCode plugin by Alessandro Pasotti via QGIS>Plugins>Fetch Python Plugins. You may have to load additional repositories to see it.
  5. Install geopy. In the command line (like step 3), type “easy_install geopy”.
  6. Specify your project’s projection in File>Project Properties.
  7. Get a Google Maps API key and tell the GeoCode plugin about it (QGIS>Plugins>GeoCode>Settings). You will need a Google account. If you don’t have your own domain name, you can just enter “google.com” when it asks for your domain.
  8. Geocode your first address by clicking on Plugins>GeoCode>Geocode. Type the full address (e.g. 121 N LaSalle Street, Chicago, IL for City Hall).
  9. The geocoded address will then appear in your Layers list as its own shapefile. All addresses geocoded (or reverse geocoded) in this project will appear in the same layer (therefore same attribute table).

Once you install all the components, you’ll never have to do this again. Geocoding will be available each and every time you use QGIS in the future on that workstation.

Tips

  • When you’re done geocoding,  save your results as a shapefile (right click the layer and click “Save as shapefile”). Twice I’ve lost my results after saving the project and quitting QGIS. When I reopened the project, the results layer was still listed, but contained no data.
  • Add a “name” column to the GeoCoding Plugin Results layer’s attribute table (toggle editing first). You can then type in the name of the building or destination at the address you geocoded. Edit the layer’s properties to have that name appear as a label for the point.

A map I made with QGIS showing three geocoded points of interest in Chicago. Data from City of Chicago’s GIS team.

 

How to geocode multiple addresses in QGIS

UPDATE April 11, 2013: Updated the directions because the “Add delimited text layer” function moved from the Plugins to Layer menu. 

UPDATE March 24, 2011: I updated the directions to use GPS Visualizer instead of BatchGeocode.com because BG stopped giving geographic coordinates in its output.

Get directions on geocoding a single address in QGIS with a plugin.

QGIS is an open-source Geographic Information Systems (GIS) application that has been gaining ground since 2004. It runs on all operating systems (it began as a Linux project) and you can download it for free.

I use it often because ESRI doesn’t make the popular ArcGIS software for Mac. That’s unfortunate, but like I said here, software, technology and mapping issues can be easily overcome – we can use QGIS to create maps. QGIS, though, is missing one major feature for basic map building: geocoding.

Here’s a step-by-step tutorial on how to bring in multiple street addresses and their XY coordinates into your QGIS map en masse: Continue reading

© 2014 Steven Can Plan

Theme by Anders NorenUp ↑