# Tag: GIS

A common way to show the distribution of places (like grocery stores) is to use a heat map. The map will appear “hotter” where there are many grocery stores and “colder” where there are few grocery stores. This kind of map can be useful to show gaps in distribution or a neighborhood that has a lot of grocery stores.

One issue with that kind of heat map is that the coverage areas change their shape and color if you zoom in, since the algorithm that clusters or determines what’s “nearby” or dense has fewer locations to analyze.

I prefer to use grids in the shape of square tiles, since Chicago is grid-oriented city and the vast majority of our streets and our routes move along east-west and north-south lines. The map above shows the location of subjects and topics of news articles in the Chicago Cityscape database.

I use PostGIS to set up most of my spatial data before visualizing it in QGIS.

This tutorial shows the two steps to using PostGIS to (1) create a grid based on an existing area (polygon), (2) assigning each point location to a tile in that grid and counting the number of locations in that tile.

If you don’t have PostGIS installed, you should install it if you work with spatial data a lot. It is much, much faster at performing most of the tasks you use QGIS or ArcGIS to perform. Both QGIS and ArcGIS can read and edit data stored in PostGIS.

Additionally, there is a function within QGIS that can create grids, and another function that can do comparisons by location and count/summarize, so all of this can be done without PostGIS.

For this tutorial, you will need a single polygon with which to create the grid. I used the boundary of the City of Chicago limits.

1. Create a grid based on an existing area

1.a. Add a new function to PostGIS

To create a grid, you need a function that draws the tiles based on the polygon. I got this from The Spatial Database Advisor.

``````-- Create required type
DROP   TYPE IF EXISTS T_Grid CASCADE;
CREATE TYPE T_Grid AS (
gcol  int4,
grow  int4,
geom geometry
);
-- Drop function is exists
DROP FUNCTION IF EXISTS ST_RegularGrid(geometry, NUMERIC, NUMERIC, BOOLEAN);
-- Now create the function
CREATE OR REPLACE FUNCTION ST_RegularGrid(p_geometry   geometry,
p_TileSizeX  NUMERIC,
p_TileSizeY  NUMERIC,
p_point      BOOLEAN DEFAULT TRUE)
RETURNS SETOF T_Grid AS
\$BODY\$
DECLARE
v_mbr   geometry;
v_srid  int4;
v_halfX NUMERIC := p_TileSizeX / 2.0;
v_halfY NUMERIC := p_TileSizeY / 2.0;
v_loCol int4;
v_hiCol int4;
v_loRow int4;
v_hiRow int4;
v_grid  T_Grid;
BEGIN
IF ( p_geometry IS NULL ) THEN
RETURN;
END IF;
v_srid  := ST_SRID(p_geometry);
v_mbr   := ST_Envelope(p_geometry);
v_loCol := trunc((ST_XMIN(v_mbr) / p_TileSizeX)::NUMERIC );
v_hiCol := CEIL( (ST_XMAX(v_mbr) / p_TileSizeX)::NUMERIC ) - 1;
v_loRow := trunc((ST_YMIN(v_mbr) / p_TileSizeY)::NUMERIC );
v_hiRow := CEIL( (ST_YMAX(v_mbr) / p_TileSizeY)::NUMERIC ) - 1;
FOR v_col IN v_loCol..v_hiCol Loop
FOR v_row IN v_loRow..v_hiRow Loop
v_grid.gcol := v_col;
v_grid.grow := v_row;
IF ( p_point ) THEN
v_grid.geom := ST_SetSRID(
ST_MakePoint((v_col * p_TileSizeX) + v_halfX,
(v_row * p_TileSizeY) + V_HalfY),
v_srid);
ELSE
v_grid.geom := ST_SetSRID(
ST_MakeEnvelope((v_col * p_TileSizeX),
(v_row * p_TileSizeY),
(v_col * p_TileSizeX) + p_TileSizeX,
(v_row * p_TileSizeY) + p_TileSizeY),
v_srid);
END IF;
RETURN NEXT v_grid;
END Loop;
END Loop;
END;
\$BODY\$
LANGUAGE plpgsql IMMUTABLE
COST 100
ROWS 1000;``````

The ST_RegularGrid function works in the same projection as your source data.

1.b. Create a layer that has all of the tiles for just the Chicago boundary

``````--This creates grids of 1,320 feet square (a 2x2 block size in Chicago)
SELECT gcol, grow, geom
into b_chicagoboundary_grid_1320squared
FROM ST_RegularGrid((select geom from chicagoboundary where gid = 1), 1320, 1320,FALSE);``````

In that query, “1320” is a distance in feet for both the X and Y planes, as the “chicagoboundary” geometry is projected in Illinois StatePlane FIPS East (Feet) (EPSG/SRID 3435).

2. Assign each point location to a tile in that grid and count the number of locations in each tile

Now you’ll need a table that has POINT-type geometries in it. For the map in this tutorial, I used a layer of location-based news articles that are used in Chicago Cityscape to highlight local developments.

``````SELECT grid.id, count(*) as count, grid.geom
INTO news_grid
FROM news_articles, b_chicagoboundary_grid_1320squared AS grid
WHERE st_intersects(news_articles.geom, grid.geom)
GROUP by grid.id;``````

This query will result in a table with three columns:

1. The ID of the tile, which is a primary key field.
2. The number of news articles in that tile.
3. The POLYGON geometry of that tile.

This infographic visually compares the difference between running a PostGIS comparison query like ST_Intersects on a large shape versus a subdivided version of that large shape. Click to embiggen.

Hundreds of GIS intersection comparisons are completed every hour on Chicago Cityscape.*

People are looking at, say, a map of the South Shore community area. That “Place” page then grabs all of the building permits, building violations, business licenses, and other “feature layers” that are stored as points.

A classic “point in polygon” comparison is made using the `ST_Intersects(place_geometry, permits_geometry)` function.

This has worked very well for several years.

The problem

But as Chicago Cityscape handles larger shapes – they come from users drawing their own, large shapes, and from large shapes like the downtown Chicago area – this query doesn’t cut it.

Setting indexes on the geometry is imperative, but it’s not the end of the to optimize performance. That’s because the index of the geometry is a rectangular bounding box (which is also called an “envelope” in GIS) that contains the entire shape of the South Shore community area.

The downtown Chicago area, however, is not even the largest shape I have. That belongs to the new Place, “Neighborhood Opportunity Fund investment zones” (NOF). Combined, they cover 75 square miles of Chicago. Downtown is only 7.7 square miles.

After I added the NOF map and tested its Place page, it “crashed” my server, metaphorically speaking. The query to just count the number of building permits in the area would take about five minutes.

There had to be a better way; in the meantime, however, I divided the NOF map into the West and South sections. This hardly improved the counting time.

The solution

Thankfully, today, I saw a tweet from Paul Ramsey linking to his blog that linked to his slides from a recent presentation about the use of PostgreSQL to store and manipulate GIS data.

In it he explained how the `ST_Subdivide`function worked. I’m going to demonstrate it using graphics from my own maps.

A normal intersection comparison, using `ST_Intersects(place_geometry, permits_geometry)` in a query creates a bounding box (envelope) around each geometry and quickly determines whether the two envelopes overlap. If they do, then it checks again to see if the actual geometries overlap. If they do, that data is returned as a response to your query.

When your two datasets are massive, like the NOF zones, which collectively cover 1/3rd of Chicago, and the building permits, which are found across the entire city…well, that led to the five minutes counting time.

Enter `ST_Subdivide`. To use it properly you would run it against your existing geometry and store the much smaller shapes, derived from the big shape, in a new table. I applied the function to all the 22,203 maps that Chicago Cityscape has and stored their unique IDs and subdivided geometries in a new “lookup” table.

Now, any time I want to compare the building permits against the NOF, the building permits are instead compared to the small shapes that were subdivided.

The query

Chicago Cityscape uses a single table (created as a materialized view) to combine all 22,203 maps. Each map is stored in a source table (for example, there’s a table to hold the 77 community areas) and the materialized view runs once a day to combine all of the maps in the source tables. This ensures our data is managed well: different source tables can hold different information, and the single table holds only the name, type, and geometry of the source tables, for faster comparison. Each entry in the single table also has a “slug”, its unique identifier.

Thus, the materialized view of the subdivided maps is created from the aforementioned single table, using this query:

`create materialized view view_places_subdivided as`
`select gid || '_' || random() as gid, slug, st_subdivide(geom) as geom`
`from view_places;`

The “gid” is designed to create a new unique ID field, as the slug field will be repeated for every subdivided of each map. A unique ID field is necessary if you want to refresh the materialized view concurrently (to allow for other queries to access the materialized view while it’s being refreshed).

* The results are cached for a few hours, because the feature layers change 1-2 times per day and at different times each day, so the limited duration cache accommodates that. Ideally I would code a way to invalidate the cache when the feature layer data is updated.

Update 12/31/19: ST_Subdivide will fail if your geometries have any or certain geometry errors (I don’t know if it’s any kind of error, or certain kinds of errors that make the function fail). Chicago Cityscape has over 37,000 features that ST_Subdivide is attempting to process, and there is a lot of room for error in managing that many features from dozens of sources.

I’m leading the development of a website for Slow Roll Chicago that shows the distribution of bike lane infrastructure in Chicago relative to key and specific demographics to demonstrate if the investment has been equitable.

We’re using GitHub to store code, publish meeting notes, and host discussions with the issues tracker. Communication is done almost entirely in GitHub issues. I chose GitHub over Slack and Google Groups because:

1. All of our research and code should be public and open source so it’s clear how we made our assumptions and came to our conclusions (“show your work”).
2. Using git, GitHub, and version control is a desirable skill and more people should learn it; this project will help people apply that skill.
3. There are no emails involved. I deplore using email for group communication.*

The website focuses on using empirical research, maps, geographic analysis to tell the story of bike lane distribution and requires processing this data using GIS functions. Normally the data would be transformed in a desktop GIS software like QGIS and then converted to a format that can be used in Leaflet, an open source web mapping library.

Relying on desktop software, though, slows down development of new ways to slice and dice geographic data, which, in our map, includes bike lanes, wards, Census tracts, Divvy stations, and grocery stores (so far). One would have to generate a new dataset if our goals or needs changed .

I’ve built maps for images and the web that way enough in the past and I wanted to move away from that method for this project and we’re using Turf.js to replicate many GIS functions – but in the browser.

Yep, Turf makes it possible to merge, buffer, contain, calculate distance, transform, dissolve, and perform dozens of other functions all within the browser, “on the fly”, without any software.

After dilly-dallying in Turf for several weeks, our group started making progress this month. We have now pushed to our in-progress website a map with three features made possible by Turf:

1. Buffer and dissolving buffers to show the Divvy station walk shed, the distance a reasonable person would walk from their home or office to check out a Divvy station. A buffer of 0.25 miles (two Chicago blocks) is created around each of the 300 Divvy stations, hidden from display, and then merged (dissolved in traditional GIS parlance) into a single buffer. The single buffer –called a “super buffer” in our source code – is used for another feature. Currently the projection is messed up and you see ellipsoid shapes instead of circles.
2. Counting grocery stores in the Divvy station walk shed. We use the “feature collection” function to convert the super buffer into an object that the “within” function can use to compare to a GeoJSON object of grocery stores. This process is similar to the “select by location” function in GIS software. Right now this number is printed only to the console as we look for the best way to display stats like this to the user. A future version of the map could allow the user to change the 0.25 miles distance to an arbitrary distance they prefer.
3. Find the nearest Divvy station from any place on the map. Using Turf’s “nearest” function and the Context Menu plugin for Leaflet, the user can right-click anywhere on the map and choose “Find nearby Divvy stations”. The “nearest” function compares the place where the user clicked against the GeoJSON object of Divvy stations to select the nearest one. The problem of locating 2+ nearby Divvy stations remains. The original issue asked to find the number of Divvy stations near the point; we’ll likely accomplish this by drawing an invisible, temporary buffer around the point and then using “within” to count the number of stations inside that buffer and then destroy the buffer.

Right-click the map and select “Find nearby Divvy stations” and Turf will locate the nearest Divvy station.

* I send one email to new people who join us at Open Gov Hack Night on Tuesdays at the Mart to send them a link to our GitHub repository, and to invite them to a Dropbox folder to share large files for those who don’t learn to use git for file management.

Screenshot 1: Showing how some streets are not being counted. There should be a yellow section of road between the two existing yellow road sections.

A friend recently asked me how many blocks of road are in his ward. He wanted to know so that he could measure how many blocks of streets would have an older style of street lighting after X number of blocks receive the new style of street lighting. For this project, I used two datasets from Chicago’s open data portal: street center lines and wards. The output data is not very accurate as there may be some overlap and some uncounted street segments; this is likely due to a shortcoming in my process. I will show you how to find the number of blocks per ward using QGIS (download Quantum GIS, a free program for all OSes).

Here’s how I did it

1. Load in the two datasets. Wards and street center lines (zipped shapefiles). They are projected in EPSG:3435.
2. Exclude several road classifications in the street center lines by querying only for `"CLASS" > '1' AND "CLASS" < '5'`. The data dictionary for the road classifications is at the end. We don’t want the river, sidewalks, expressways, and any ramps to be included in the blocks per ward analysis.
3. Intersect. In QGIS, select Vector>Geoprocessing Tools>Intersect. The input vector layer is “Transportation” (the name of the street center lines dataset) and the intersect layer is “Wards”. Save the resulting shapefile as “streets intersect wards”. Click OK. This will take a while.
5. You’ll notice some of the issues with the resulting shapefile: missing street segments (see screenshot 1). What should QGIS do if a street is a ward boundary?
6. Obtain street length information, part 1. Remove all the columns in the “streets intersect wards” shapefile that have something to do with geometry. These are now outdated and will confuse you when you add a geometry column generated by QGIS.
7. Obtain street length information, part 2. With the “streets intersect wards” shapefile selected in the table of contents, select Vector>Geometry Tools>Export/Add geometry columns. Select “streets intersect wards” shapefile as your input layer, leave CRS as “Layer CRS” and save as new shapefile “streets intersect wards geom”.
9. You will see a new column at the end of the attribute table called LENGTH. Since the data is projected in EPSG:3435 (Illinois StatePlane NAD83 East Feet), the unit is feet.
10. Simply export “streets intersect wards geom” to a CSV file and open the CSV file in a spreadsheet application. From there you can group the data by Ward number and add the street lengths together. (I thought it would be faster to do this in a database so I imported it into a localhost MySQL database and ran a simple query, `SELECT wardNum, sum(`chistreets_classes234`.`LENGTH`) as sum FROM chistreets_classes234 WHERE ward > 0 group by wardNum`. I then exported this to a spreadsheet to convert feet to miles.)

Because of the errors described in step 5, you shouldn’t use this analysis for any application where accuracy is important. There are road lengths missing in the output dataset (table with street lengths summed by ward) and I cannot tell if the inaccuracy is equally distributed.

[table id=7 /]

Wards 19 (south side) and 41 (Norwood Park, including O’Hare airport) have the highest portion of street length in the city.

Screenshot 2: Ward 41 is seen.

Street data dictionary

Column is “CLASS”. The value is a string. This dataset lacks alleys. Adapted from the City’s data dictionary.

1. Expressway

2. Arterials (1 mile grid, no diagonals)

3. Collectors (includes diagonals)

4. Other streets (side streets, neighborhood streets)

5. Named alleys (mostly downtown, like Couch Place and Garland Place)

7. Tiered (lower level streets, including LaSalle, Michigan, Columbus, and Wacker)

9. Ramps (goes along with expressway)

E. Extent (not sure how to describe these; includes riverwalk and lake walk segments, and Navy Pier, also includes some streets, like Mies van der Rohe Way)

RIV. River

S. Sidewalk

99. Unclassified

I presented to the Chicago GIS Network Meetup group on February 5,2013, about alternatives to Google when it comes to mapping on the web. I created the presentation and outline a couple hours before giving it and came up with this slideshow with three frames.

Google Maps and its data is a one-way street (or many one-way streets). Google will take data but won’t give it back.

Google Maps has all of these features, but they’re easier to manipulate when you use an alternative. Alternatives like: MapBox, TileMill, OpenLayers, OpenStreetMap (made easy with JOSM), GeoCommons – I’m sure there are plenty more.

OpenStreetMap is the Wikipedia of online mapping and geographic data. Considering switching to OSM.

This interactive map was created for a Grid Chicago article to show the cities and Amtrak routes mentioned. Click and drag it around or hover your mouse on the red train station markers.

Want to create a map like that and publish it on your own website? It’s easy. I’ll show you how to do it in less than 30 minutes. First, download the following files:

All shapefiles are from the United States Department of Transportation, Bureau of Transportation Statistics’s National Transportation Atlas 2012 edition except for Illinois places, which comes from the Census Bureau’s TIGER project.

At the end of this tutorial, you’ll have a good introduction on how to find geographic data, build a map with TileMill, style the map, and publish it for the public. Your map will not look like mine as this tutorial doesn’t describe how to add labels or use the hover/info feature.

Tutorial to make Amtrak Illinois map

1. Unzip the four ZIP files you downloaded and move their contents into a folder, like /Documents/GIS/Amtrak Illinois/shapefiles. This is your project folder.
2. Install TileMill and open it.
3. Set up a project. In the Projects pane, click “New Project”. In the filename field, title it “amtrak_illinois”. Ensure that the checkbox next to “Default data” is checked – this shows a world map and helps you get your bearings (but it’s not absolutely necessary).
4. Get familiar with TileMill’s layout. Your new project will open with the map on the left side and your Carto style code on the right side. There are four buttons aligning the left edge of your map. From top to bottom they are: Templates, Font list, Carto guide, and Layers.
5. Add a layer. We’re going to add the four shapefile layers you downloaded. Click the “Layers” button and then click “Add layer”. In the ID field, type in “amtrak_routes”. For Datasource, browse to your project folder and find “amtrak.shp” – this file has the Amtrak route lines. Then click “Done”. Click “Save & Style”.
6. Style that layer. When you click “Save & Style” after adding a layer, your attention will be called to the Carto style code on the right side of TileMill. A section of code with the “amtrak_routes” #selector will have been inserted with some default colors and styles. If you know CSS, you will be familiar with how to change the Amtrak routes line styles. Change the “line-color” to “#000”. After “line-color”, add a new line and insert “line-opacity: 0.5;”. This will add some transparency to the line. Press the “Save” button above the code.
7. Add remaining layers. Repeat Step 5 and add 3 more layers: “amtrk_sta.shp” (ID field: “amtrak_stations”), “state.shp” (ID field: “states”), and “tl_2012_17_place.shp” (ID field: “illinois_cities”).
8. Hide bus stations. The Amtrak stations layer shows bus and ferry stations as part of Amtrak’s Thruway connections. You probably don’t want to show these. In your Carto style code, rename the #selector from “#amtrak_stations” to “#amtrak_stations[STNTYPE=’RAIL’]”. That makes the following style code only apply to stations with the “rail” type. Since there’s no style definition for things that aren’t of that type, they won’t appear.

Screenshot of my map.

TileMill has many exporting options. You can save it as MBTiles and publish the map for free using MapBox (TileMill’s parent), or you can export it as image files (but it won’t be interactive), or you can display the map using the Leaflet JavaScript map library (which I use for the Chicago Bike Map app). This tutorial will explain how to export MBTiles and upload to MapBox, the server I’m using to display the map at the top of this page.

1. Change project settings. To upload to MapBox, you’ll have to export your project as MBTiles, a proprietary format. Click the “Export” button above your Carto style code and click “MBTiles”. You’ll be asked to provide a name, description, attribution, and version. Input appropriate text for all but version.
2. Adjust the zoom levels. Adjust the number of zoom levels you want (the more you have the longer it takes to export and upload your project, and you might exceed MapBox’s free 50 MB account limit). My map has zoom levels 8-11.
3. Adjust the bounds. You’ll then want to draw your bounds: how much of the map’s geographic extents you want to export. Zoom to a level where you can see the entire state of Illinois in your map. Hold down the Shift key and drag a box around the state, plus a buffer (so viewers don’t fall of your map when they pan to the edges).
4. Export your map. Click Export and watch the progress! On a four-year-old MacBook it took less than one minute to export the project.
5. Bring the export to your project folder. When export finishes, click the “Save” button and browse to your project folder. Click the file browser’s save button.
7. Embed it in your website. Click the “Share” button in the upper left corner of your map and copy the embed code. Paste this into the HTML source code of a webpage (or in a WordPress post) and save that (I’m not going to provide instructions on how to do that).

Now you know how to find geographic data, build a custom map using the TileMill application, begin to understand how to style it, and embed your map for the public on a website or blog.

N.B. I was originally going to use QGIS to build a map and then publish a static image before I realized that TileMill + MapBox (the website) can build a map but publish an interactive feature instead of a static image. I’m happy I went that route. However, I did use QGIS to verify the data and even create a new shapefile of just a few of the key train stations on the Lincoln Service (the centerpiece of my Grid Chicago article).

While working on a private web application that I call Chicago Crash Browser, I added some code to show the share of pedestrian and pedalcyclist crashes. The site offers users (sorry I don’t have a web server that can make it public) a list of the “Top 10” intersections in terms of bike crash frequency (that’s bike+auto crash). You can click on the intersection and a list will populate showing all the pedestrian and pedalcyclist crashes there, sorted by date. At the bottom of the list is a simple sentence that tells what percentage pedestrian and pedalcyclists made up at that intersection.

I’m still developing ideas on how this information may be useful, and what it’s saying about the intersection or the people using it.

Let me tell you about a few:

Milwaukee Avenue and Ogden Avenue

I mentioned in my article Initial intersection crash analysis for Milwaukee Avenue that this intersection is the most bike crash-frequent.

23 crashes within 150 feet of the center, 2005-2010

82.61% bike crashes **

17.39% ped crashes.

Ashland Avenue and Division Street

28 crashes within 150 feet of the center, 2005-2010

46.43% bike crashes

53.57% ped crashes **

Milwaukee, North and Damen Avenues

46 crashes within 150 feet of the center, 2005-2010

39.13% bike crashes

60.87% ped crashes **

Halsted Street, Lincoln and Fullerton Avenues

38 crashes within 150 feet of the center, 2005-2010

42.11% bike crashes

57.89% ped crashes **

Montrose Avenue and Marine Drive (Lake Shore Drive ramps)

11 crashes within 150 feet of the center, 2005-2010

90.91% bike crashes **

9.09% ped crashes

Why do you think some intersections have more of one kind of crash than the other?

People walking at Milwaukee-North-Damen.

The Chicago Crash Browser can be made public if I have a host that offers the PostgreSQL database. Do you have one to offer?

Common bike-car crash locations in West Town. The bottom blue circle identifies Ogden/Milwaukee, where there is a yellow trap for northbound, left-turning motorists (from Milwaukee to Ogden) that makes them run into southbound bicyclists who have a green light.

My contribution to a discussion on The Chainlink, Are protected bike lanes going in the right places?

Kelvin, Milwaukee/Ogden/Chicago is the intersection along Milwaukee Avenue with the highest number of bicycle crashes. I created this table and map to show them, using data from 2007-2009.

The blue rings on the map are called, in GIS parlance, “buffers” and are circles used to select things (in this case, bike crashes) within a certain distance of the circle center. In this map I used 50 feet radius buffers (100 feet diameter). While this distance encompasses the intersection from center to all four curbs, it doesn’t encompass the crashes that happened just outside the buffer that were still most likely influenced by the intersection (like drivers’ turning movements).

I am working on a project with three friends to create a better map and “crash browser”. I mentioned it in the last story on Grid Chicago in this post. For this project, we are using 200 feet radius (400 feet diameter) buffers to ensure we encompass the entire intersection and the area in which it still has an effect. This also grabs the bike lane “pinch points”, places where a bike lane doesn’t start until 100-200 feet beyond the intersection.

I am also concerned with the strategy and approach CDOT is using to choose locations. It’s not transparent; at MBAC, CDOT said they were choosing locations “without controversy and that could be implemented quickly”.

Read more about Kinzie Street, Chicago’s first protected bike lane, and my other thoughts on protected bike lanes

It is now possible to upload a shapefile (and its companion files SHX, PRJ, and DBF) to Google Fusion Tables (GFT).

Before we go any further, keep in mind that the application that does this will only process 100,000 rows. Additionally, GFT only gives each user 200 MB of storage (and they don’t tell you your current status, that I can see).

2. Prepare your data. Ensure it has fewer than 100,000 rows.
3. ZIP up your dataX.shp, dataX.shx, dataX.prj, and dataX.dbf. Use WinZip for Windows, or for Mac, right-click the selection of files and select “Compress 4 items”.
4. Visit the Shape to Fusion website. You will have to authorize the web application to “grant access” to your GFT tables. It needs this access so that after the web application processes your data, it can insert it into GFT.
5. If you want a Centroid Geometry column or a Simplified Geometry column added, click “Advanced Options” and check their checkboxes – see notes below for an explanation.
7. Leave the window open until it says it has processed all of the rows. It will report “Processed Y rows and inserted Y rows”. You will be given a link to the GFT the web application created.

Sample Data

If you’re looking to give this a try and see results quickly, try some sample data from the City of Chicago data portal:

Notes

I had trouble many times while using Shape to Fusion in that after I chose the file to upload and clicked Upload, I had to grant access to the web application again and start over (choose the file and click Upload a second time).

Centroid Geometry – This creates a column with the geographic coordinates of the centroid in a polygon. It lists it in the original projection system. So if your projection is in feet, the value will be in feet. This is a function that can easily be performed in free and open source QGIS, where you can also reproject files to get latitude and longitude values (in WGS84 project, EPSG 4326). The centroid value is surrounded in the field by KML syntax “<Point><coordinates>X,Y</coordinates></Point>”.

Simplified Geometry – A geometry column is automatically created by the web application (or GFT, I’m not sure). This function will create a simpler version of that geometry, with fewer lines and vertices. It also creates columns to list the vertices count for the simple and regular geometry columns.

Plugins for QGIS I use most often.

All of these can be installed automatically by QGIS. Click on Plugins>Fetch Python Plugins. Then search for the plugin, click on its name, and click Install Plugin. Few plugins require a restart.

• MMQGIS – Great for working with CSV files; also merges layers (even if they have differing attributes); has various other useful functions, including converting string data to float data. HasÂ Voronoi diagram function (takes a long time to process).
• fTools – Replicates some of the most basic geographic tools in ArcGIS, like Clip, Dissolve, and Reproject. Can also add X/Y values to point attribute tables that are missing them (if you want latitude/longitude, you must reproject into a coordinate reference system first, like WGS84 [EPSG: 4326]). Unfortunately, there’s little information on what each fTools function does. Below are descriptions:
• Extract Nodes – Create a point at each intersection of vertices.
• Basic Statistics – Generate arithmetic statistics for fields, same as statistics function in ArcGIS. Great for quickly understanding the extent of values in a field (especially numeric values), like mean, max, min, standard deviation, and number of unique values.
• Nearest Neighbour Analysis – More details here.
• Geoprocessing Tools>Dissolve – Combine features based on a shared attribute. For example, all features with an identical STREET_TYPE be combined into a single feature. For example, all “Avenues” will become one feature and all “Boulevards” will become a second feature. Only works on polygon layers.
• DescriÃ§Ã£o em portuguÃªs
• Table Manager
• Open Layers – Embed Google, Yahoo, Bing, and OpenStreetMap layers in your map. See my example.