Tagopen data

Working with ZIP code data (and alternatives to using sketchy ZIP code data)

1711 North Kimball Avenue, built 1890

This building at 1711 N Kimball no longer receives mail and the local mail carrier would mark it as vacant. After a minimum length of time the address will appear in the United States Postal Service’s vacancy dataset, provided by the federal Department of Housing and Urban Development. Photo: Gabriel X. Michael.

Working with accurate ZIP code data in your geographic publication (website or report) or demographic analysis can be problematic. The most accurate dataset – perhaps the only one that could be called reliably accurate – is one that you purchase from one of the United States Postal Service’s (USPS) authorized resellers. If you want to skip the introduction on what ZIP codes really represent, jump to “ZIP-code related datasets”.

Understanding what ZIP codes are

In other words the post office’s ZIP code data, which they use to deliver mail and not to locate people like your publication or analysis, is not free. It is also, unbeknownst to many, a dataset that lists mail carrier routes. It’s not a boundary or polygon, although many of the authorized resellers transform it into a boundary so buyers can geocode the location of their customers (retail companies might use this for customer tracking and profiling, and petition-creating websites for determining your elected officials).

The Census Bureau has its own issues using ZIP code data. For one, the ZIP code data changes as routes change and as delivery points change. Census boundaries needs to stay somewhat constant to be able to compare geographies over time, and Census tracts stay the same for a period of 10 years (between the decennial surveys).

Understanding that ZIP codes are well known (everybody has one and everybody knows theirs) and that it would be useful to present data on that level, the Bureau created “ZIP Code Tabulation Areas” (ZCTA) for the 2000 Census. They’re a collection of Census tracts that resemble a ZIP code’s area (they also often share the same 5-digit identifiers). The ZCTA and an area representing a ZIP code have a lot of overlap and can share much of the same space. ZCTA data is freely downloadable from the Census Bureau’s TIGER shapefiles website.

There’s a good discussion about what ZIP codes are and aren’t on the GIS StackExchange.

Chicago example of the problem

Here’s a real world example of the kinds of problems that ZIP code data availability and comprehension: Those working on the Chicago Health Atlas have run into this problem where they were using two different datasets: ZCTA from the Census Bureau and ZIP codes as prepared by the City of Chicago and published on their open data portal. Their solution, which is really a stopgap measure and needs further review not just by those involved in the app but by a diverse group of data experts, was to add a disclaimer that they use ZCTAs instead of the USPS’s ZIP code data.

ZIP-code related datasets

Fast forward to why I’m telling you all of this: The U.S. Department of Housing and Urban Development (HUD) has two ZIP-code based datasets that may prove useful to mappers and researchers.

1. ZIP code crosswalk files

This is a collection of eight datasets that link a level of Census geography to ZIP codes (and the reverse). The most useful to me is ZIP to Census tract. This dataset tells you in which ZIP code a Census tract lies (including if it spans multiple ZIP codes). HUD is using data from the USPS to create this.

The dataset is documented well on their website and updated quarterly, going back to 2010. The most recent file comes as a 12 MB Excel spreadsheet.

2. Vacant addresses

The USPS employs thousands of mail carriers to delivery things to the millions of households across the country, and they keep track of when the mail carrier cannot delivery something because no one lives in the apartment or house anymore. The address vacancy data tells you the following characteristics at the Census tract level:

  • total number of addresses the USPS knows about
  • number of addresses on urban routes to which the mail carrier hasn’t been able to delivery for 90 days and longer
  • “no-stat” addresses: undeliverable rural addresses, places under construction, urban addresses unlikely to be active

You must register to download the vacant addresses data and be a governmental entity or non-profit organization*, per the agreement** HUD has with USPS. Learn more and download the vacancy data which they update quarterly.

Tina Fassett Smith is a researcher at DePaul University’s Institute of Housing Studies and reviewed part of this blog post. She stresses to readers to ignore the “no-stat” addresses in the USPS’s vacancy dataset. She said that research by her and her colleagues at the IHS concluded this section of the data is unreliable. Tina also said that the methodology mail carriers use to identify vacant addresses and places under change (construction or demolition) isn’t made public and that mail carriers have an incentive to collect the data instead of being compensated normally. Tina further explained the issues with no-stat.

We have seen instances of a relationship between the number of P.O. boxes (i.e., the presence of a post office) and the number of no-stats in an area. This is one reason we took it off of the IHS Data Portal. We have not found it to be a useful data set for better understanding neighborhoods or housing markets.

The Institute of Housing Studies provides vacancy data on their portal for those who don’t want to bother with the HUD sign-up process to obtain it.

* It appears that HUD doesn’t verify your eligibility.

** This agreement also states that one can only use the vacancy data for the “stated purpose”: “measuring and forecasting neighborhood changes, assessing neighborhood needs, and measuring/assessing the various HUD programs in which Users are involved”.

How to ascertain the area of Chicago beach parking lots to find the largest one

This tutorial is a direct response to a question about which Chicago beach has the largest parking lot. Matt Nardella of Moss Design, in a response to a Twitter-based conversation about Alderman Cappleman’s suggestion that perhaps Montrose beach has too much parking, researched on Wikipedia to find the answer. This is where it said that Montrose beach has the largest parking lot of any of Chicago’s 27 beaches.

Now we’re going to try and prove which beach has the largest associated parking lot.

This tutorial will teach how you to (1) display Chicago beaches, (2) download data held in OpenStreetMap, (3) find the parking lots within the OpenStreetMap data, (4) find the parking lots near the beaches, and (5) calculate each parking lot’s area (in square feet). You can use this tutorial to accomplish any one of these three tasks, or the same tasks but on a different part of OpenStreetMap data (like the area of indoor shopping malls).

You’ll need the QGIS software before starting. You’ll also need at least 500 MB of free space. Start a project folder called “Biggest Parking Lots in Chicago” and make two more folders, within this folder, called “origdata” and “data”.

First, let’s get some data about beaches

Since we only want to know about the parking lots near Chicago beaches we need to get a dataset that locates them. This data is presumably within the same OpenStreetMap extract we’re waiting for, but it’s best to go to the most reliable source.

  1. Download the Parks – Facilities & Features shapefile from the City of Chicago open data portal. I’ve already verified that it has all the beaches (as points).
  2. Open the parks shapefile in a new document in QGIS (call it “map01a.qgs”). You might not see the data so right-click the parks layer and select “Zoom to layer extent”.
  3. Filter out all the points that aren’t beaches by using the query builder. Right-click the layer and select “Filter…” and input this filter expression: “FACILITY_N” = ‘BEACH’
  4. Your map will now show 26 points along an invisible lakefront and then the beach at Humboldt Park.
  5. For the rest of this tutorial we’ll reference the beaches layer as ParkFacilities.

Second, let’s get some data from OpenStreetMap

The easiest way to grab data from OpenStreetMap is by using QGIS, a free, open source desktop GIS application that has myriad plugins that match the capabilities of the heavyweight ESRI ArcGIS line of software. We can download OpenStreetMap data straight into QGIS.

  1. Click on the Vector menu and select OpenStreetMap>Download data.
  2. We want as much data as will cover the beaches information so in the Extent section of the dialog box choose “From layer” and select the beaches layer (called ParkFacilities).
  3. Browse to the “origdata” folder you created in the first task and choose the filename “chicago.osm”.
  4. Click OK and watch the progress meter tell you how much data you’ve downloaded from OpenStreetMap.
  5. Once it’s completed downloading, click “Close”. Now we want to add this data to our map.
  6. Drag the chicago.osm file from your file system into the QGIS Layers list. A dialog box will appear asking which layers you want to add.
  7. Select the layer that has the type “MultiPolygon”. This represents areas like buildings and parking lots.

Third, display the OpenStreetMap data and eliminate everything but the parking lots

We only want to compare parking lots in this dataset with beaches in the previous dataset so we need to eliminate everything from the OpenStreetMap data that’s not a parking lot. Since OSM data depends on tags we can easily select and show all the objects where “amenity” = “parking”.

  1. Filter out all the polygons that aren’t parking lots by using the query builder. Right-click the layer and select “Filter…” and input this filter expression: “amenity” = ‘parking’. Hopefully all the parking lots have been drawn so we can analyze a complete dataset!
  2. Your map will now show little squares, rectangles, and myriad odd shapes that represent parking lots around Chicagoland. (Most of these have been drawn by hand.) It should look like Image XXX.
  3. Since this data is stored in a projection with the codename of EPSG:3435 and the OpenStreetMap data is stored with codename of EPSG:4326 we need to convert the beaches to match the beaches (because we’re going to be using feet as a  measuring distance instead of degrees).
  4. Right-click the layer and select “Save As…” and choose the format “ESRI Shapefile”. Then click the top Browse button and select a location on your hard drive for the converted file.
  5. For “CRS” choose “Selected CRS”. Then click the bottom Browse button and search for the EPSG with the codename 3435. Select the checkbox named “Add saved file to map” so the new layer will be immediately added to our map.

Fourth, select all the parking lots near a beach

This task will select all the parking lots near the beaches. I chose 2,000 feet but you could easily choose a different distance. You might want to measure on Google Earth some minimum and maximum distances between beaches and their respective, associated parking lots.

(This task is easier using PostGIS which has a ST_DWithin function to find objects within a certain distance because we can avoid having to create the buffer in QGIS.)

  1. Create a 2,000 feet buffer. Select Vector>Geoprocessing tools>Buffer.
  2. In the Buffer(s) dialog box, select ParkFacilities (which has your beaches) as the “Input vector layer”. Choose a distance of 2000 (the units are pre-chosen by the projection and since we’re using a projection that’s in feet, the distance unit will be feet).
  3. Browse to your project folder’s “data” folder and save the “Output shapefile” as “beaches buffer 2000ft.shp”.
  4. Click “Add results to canvas” and then click OK.
  5. Double check that 2,000 feet was enough to select the parking lots. In my case, I see that the point representing Montrose beach was further than 2,000 feet away from a parking lot.
  6. Let’s do it again but with 3,000 feet this time, and saving the “Output shapefile” as “beaches buffer 3000ft.shp”.
  7. This time it worked and the nearest parking lots are now in the 3,000 feet radius buffer. You can see in Image XXX how the two concentric circles stretch out from the beach point towards the parking lots.

We’re not done. We’re next going to use our newly created 3,000 feet buffers to tell us which parking lots are in them. These will be presumed to be our beach parking lots.

  1. Use the “Select by location” tool to find the beaches that intersect our 3,000 feet buffers. Select Vector>Research Tools>Select by location.
  2. Follow me: we want to select features in parking 3435 [our parking lots] that intersect features in beaches buffer 3000ft [our beaches]. We’ll modify the current selection by creating a new selection so that we don’t accidentally include any features previously selected.
  3. You’ll now see a bunch of parking lots turn yellow meaning they are actively selected.
  4. Let’s save our selected parking lots as a new file so it will be easier to analyze just them. Right-click “parking 3435″ and select “Save Selection As…” (it’s important to choose “Save Selection As” instead of “Save As” because the former will save just the parking lots we’ve selected).
  5. Save it as “selected parking 3435.shp” in your “data” folder. The CRS should be EPSG:3435 (NAD83 Illinois StatePlane East Feet). Check off “Add saved file to map” and click OK.
  6. Turn off all other layers except ParkFacilities to see what we’re left with and you’ll see what I show in Image XXX.

Fifth, let’s calculate

Calculating the area is probably the easiest part of this tutorial.

  1. Close all attribute tables you may have opened.
  2. Select Vector>Geometry Tools>Export/Add geometry columns and choose “selected parking 3435″ as your input vector layer.
  3. Leave all other options as-is and press OK. When told about how QGIS can’t access something simultaneously, choose “Yes”.
  4. QGIS should have told you that “selected parking 3435″ has been updated. Right-click the layer and choose “Open Attribute Table”.
  5. Scroll to the far right and you’ll see a new column called AREA. This represents the parking lot’s area in square feet.
  6. Click on the AREA column heading to sort it from smallest to largest. Scroll to the bottom of the list and you’ll find the parking lot with the largest area. Double check – is it near a beach?

Conclusion

With my analysis, and with the data available from OpenStreetMap when I created this tutorial, there are three abnormally large parking lots:

  1. A linear lot near the Lincoln Park Zoo and North Avenue beach (6.8 acres)
  2. A curving lot near Montrose Beach (4.75 acres)
  3. An irregularly shaped lot near Montrose Beach (4.5 acres)

There’s one major caveat in this analysis and that’s the missing parking lots on beaches south of Navy Pier. This means that no one has drawn them into OpenStreetMap so it’s time to start editing!

Chicago wards with the most landmarked places

Montgomery Ward Complex

People float by the Montgomery Ward Complex on Kayaks. Photo by Michelle Anderson.

Last week I met with the passionate staff at Landmarks Illinois to talk about Licensed Chicago Contractors. I wanted to understand the legality for historic preservation and determine ways to highlight landmarked structures on the website and track any modifications or demolitions to them.

I incorporated two new geographies over the weekend: Chicago landmark districts, and properties and areas on the National Register of Historic Places (both available on the City of Chicago open data portal).

I used pgShapeLoader to import them to my DigitalOcean-hosted PostgreSQL database and modified some existing code to start looking at these two new datasets. Voila, you can now track what’s going on in the Montgomery Ward Company Complex – currently occupied by “600 W” (at 600 W Chicago Avenue) hosting Groupon among other businesses and restaurants.

Today I was messing around with some queries after I saw that the ward containing this place on the National Register – the 27th – also had a bunch of other listed spots.

I wrote a query to see which wards have the most places on the National Register. The table below lists the top three wards, with links to their page on Licensed Chicago Contractors. You’ll find that many have no building permits associated with them. This is because of two reasons: the listing’s small geography to look within for permits may not include the geography of issued permits (they’re a few feet off); we don’t have a copy of all permits yet.

[table “15” not found /]

4 wards don’t have any listings on the National Register of Historic Places and nine wards have one listing.

Top 20 most active general contractors in Chicago this year so far

River Point tower is under construction at 444 W Lake Street by James McHugh Construction. Its associated building permit, which only comprises the base/train cover structure depicted, has an estimated project cost of $27,050,000. Photo by Bart Shore.

Here I go again not talking about urban planning on Steven Can Plan. With my Licensed Chicago Contractors & Construction Activity website I’ve ingested a lot of data from the City of Chicago’s open data portal that has a LOT to say about what’s going on.

Nearly all permits include a reasonable value that estimates the project’s cost – like $13,150,000.00 for converting Lafayette Elementary School into a performing arts high school. (All demolition permits show $1 and some permits cost over $6 billion, which we know is false.)

I recently reorganized the data (migrating from MySQL to PostgreSQL which supports the JSON datatype) to expand the ways I can extract info and I’ve sorted it by general contractor’s aggregate project value for this year.

This is a list of the 20 most active general contractors in Chicago for 2014 until May 13, 2014. I define “activity” purely by the companies’ involvement in a Chicago-based project and corresponds to that project’s estimated cost. (The data doesn’t specify a level of their involvement in a project.)

[table “11” not found /]

Do any names ring a bell, or surprise you?

Finding interesting data in the building permits dataset

I had several great conversations with fellow #chihacknight visitors at the 1871 tech hub (222 W Merchandise Mart Plaza) about how to reveal more information about what’s being built in Chicago. I had introduced Licensed Chicago Contractors at the previous week’s hack night and tonight I showed site changes I made like how much faster it is now that I use DataTables’s server-side processing function.

Some of the discussions resulted in suggestions to try new tools and methods that would make processing the data more efficient, or more revealing. What are the ways I can aggregate the data, or connect to similar data from other sources?

One of the new features I announced I’ll be adding is statistics on building activity by neighborhood. I started testing some queries to see the results, and to find the query that outputs that information in a way that’ll pique users’ interests.

I calculated the aggregate estimated costs of all building permit activity for the past 90 days in select neighborhoods. All of the data was automatically generated using a simple MySQL query, but one that will get faster after switching to Postgres. (I eliminated any project whose estimated cost was less than $1,000 because there are many project types that are $0 to several hundred dollars.)

  • Logan Square: 77 projects, totaling $16,295,997.50 at a $211,636.33 average cost
  • West Loop: 30 projects, totaling $27,646,899.00 at a $921,563.30 average cost
  • Andersonville: 6 projects, totaling $358,770.00 at a $59,795.00 average cost
  • Bronzeville: 34 projects, totaling $17,050,662.00 at a $501,490.06 average cost
  • Hyde Park: 20 projects, totaling $13,492,265.00 at a $674,613.25 average cost
  • Humboldt Park: 35 projects, totaling $41,917,988.00 at a $1,197,656.80 average cost

How does Humboldt Park double the other neighborhoods’ average? I think it’s pretty simple: this $40 million Salvation Army residence that’s going to be built at 825 N Christiana Avenue.

The results for Bronzeville were higher than I expected because this is a distressed neighborhood that has lost of lot of population and has seen little development in the past several years. This isn’t to say the neighborhood is poor – I saw a report last fall that highlighted how the purchasing power of Bronzeville residents was quite high relative to neighboring communities.

Ronnie Harris showed me the report when I participated in the Center for Neighborhood Technology’s civic app competition and hackathon. We, along with Josh Engel, designed Build It! Bronzeville, although my participation was really pushing them to develop Josh’s game idea more and construct a paper version of it. Our team won the competition and Ronnie and Josh have kept working on it (I saw them at last week’s hack night).

Projects that pushed up Bronzeville’s average included several multi-family homes at around $1.4 million each on the blocks of 4700 and 4800 S Calumet Avenue.

Code discussion

I can’t test for the “Loop” right now in the way I have my data structured because a LIKE ‘%loop%’ query of the database will include “West Loop” records.

I need to change how the building permit data is stored – in my database – a little so that my site’s PHP codebase and MySQL queries can sift through the data faster. For example, I’m storing several key-value pairs as a JSON-encoded string in a TEXT field. One #chihacknight developer suggested I switch from MySQL to PostgreSQL because Postgres has native JSON-parsing functions.

I looked up how to use Postgres’s JSON functions and realized that, yes, I probably should do that, but that I also need to change the array structure of the data I’m encoding to JSON. In other words, with a tiny change now, I can be better prepared for the eventual migration to Postgres.

Using open data: Showing what projects licensed Chicago contractors are working on

The New City developer recently received permits for over $50 million of construction work across from the Lincoln Park REI.

The New City developer recently received permits for nearly $50 million of construction work across from the Lincoln Park REI.

I wrote in my last post that I found “pain” in the process of finding a licensed contractor in the city (the pain of finding one who can install in the public way remains unmedicated).

I wanted to provide more than a list (and a map) and EveryBlock has already answered “What’s going on across the street from my house?”. I wanted to add value by helping people answer the question, “What contractor should I choose?”

Several other sites help you do this, like BuildZoom, Angie’s List, and the Better Business Bureau, by showing you customer reviews or complaints. I needed something different from mimicking a review site (a lot of the businesses are also on Yelp) so I decided to answer the question, “What projects have these companies done?”

That’s where the City of Chicago’s open data portal comes in: it has a dataset for Building Permits.

Check out 180 Properties, LLC from Skokie, Illinois. They’ve had two permits issued within the last three months. One project, at 3705 N Hoyne Avenue, is for interior renovation: “Remove/replace cabinets, countertops, flooring, patch & repair drywall”. The estimated cost for the project is $80,000. Sound like the kind of contractor you’re looking for? Call them up or keep researching.

You can even see who else is working on this project. Burnham Nationwide is listed as an expeditor on this project which means they’re likely acting as the intermediary between the Chicago Department of Buildings and the companies actually doing the work. Burnham will do site plans, drawings, occupancy, and ensure everything is in order. The property owner is also listed in the permit information.

For people who want to explore construction activity the other way around, finding projects before contractors, I created a “Permits explorer” page. This page searches the Building Permits dataset to show the most recently issued permits for the most expensive projects. Right now a project to alter and renovate Chicago Vocational High School at 2100 E 87th Street has an estimated cost of $40 million. I didn’t realize how much the Department of Buildings is funded by permits until I saw the permit fees.

The permit fee for the school renovation would have been $372,598 fee but the dataset said the entirety was waived (likely because it’s a Chicago Public School). Other projects I reviewed had permit fees between $30,000 and $75,000.

Real estate speculators, development watchers, and editors of Curbed Chicago should find browsing permits useful. The list includes two projects associated with the New City development at Halsted Street and Clybourn Avenue, across from the Lincoln Park REI store. The two permits are held by 1515 N Halsted, LLC. The first is for a “3 story steel framed mixed-use retail, restuarant, assembly (movie theater) building” at 1500 N Clybourn Avenue (for an estimated cost of $26,403,193), and the second permit describes a 7 story parking garage at 710 W Schiller Street (for $21,518,012).

How it works

I used my programming magic – I prefer PHP – to query the Socrata Open Data API (or SODA) to look for the given contractor’s name in one of eight name fields (there are 16 name fields) and then return information about the most recent permits. The Building Permits dataset gives the project location, work description, and its estimated cost. I figured you could use the project’s estimated cost to gauge the kind of work the contractor does – is the contractor more familiar with big jobs, or little jobs?

This method isn’t the best. Ideally there’d be a relational database where the “Contractor ID” in the licensed contractors dataset would match a “Contractor ID” field in the permit dataset. But the licensed contractors dataset doesn’t have a unique ID field, and isn’t even on the data portal.

Instead, I’m finding contractor-to-project matches by finding the first two or three words of the contractor’s name at the beginning of eight of the 16 name fields in the permit field. SODA works quickly on the query and it passes the results back to PHP in no time.

In the future I’d like to pull in scores and reviews from Yelp and other sites that have APIs (Angies List and Better Business Bureau don’t), as well as try to determine the name of the building – if it has one – by querying OpenStreetMap Nominatim.

Outta left field: I recreated the city’s contractor listing website

The site looks good and works quickly on mobile devices.

LicensedChicagoContractors.com looks good and works quickly on mobile devices.

I’m working on a secret project to get something installed on the public way. The process to find out how to do it is as arduous as getting it done because you never finish learning the process. Every time you think you’ve figured something out, there’s something else.

To get the secret project installed I need a licensed contractor. Not only do a need a licensed contractor, but they must have the license to do work in the public way (versus doing work at your private property).

The Chicago Department of Buildings publishes a continually updated list of licensed contractors on its website but it’s annoying to use. There’s no search, no permanent links, and if you leave the window open long enough this weird session manager kicks in and stops you from browsing to the next page of results.

I asked my followers on Twitter the best way to scrape the data. The ever-amusing Dan O’Neill, who leads the Smart Chicago Collaborative (which hosts the Chicago Crash Browser), recommended just copying and pasting all 10 pages. That would work fine for the first time, but I might need to do it a second time when the data updates. Nick Bennett jumped in and used Selenium, a tool that automates web browsers. He said, “it’s inefficient but for a small job like that I figured why bother with something faster”.

I imported the data into a MySQL table and ran through some of my “standard” data cleaning methods (like trimming leading and trailing spaces, removing odd characters, and extracting good information into other columns, like phone numbers and ZIP codes).

With PHP – my favorite web language – I created a single page website that loads all 3,930 licensed general contractors extremely fast, loads the DataTables JavaScript library to enhance the table with search and sort. I used Bootstrap to make a responsive design meaning it adjusts to fit multiple screen sizes including smartphones and tablets.

I call it LicensedChicagoContractors.com.

The new website still doesn’t solve my problem of finding a company that can do work in the public way – I’m still working on this. The last online dataset I could find is on the city’s old http://egov.cityofchicago.org domain, and was cached by the Internet Archive’s Wayback Machine on January 25, 2010. Ideally this information – plumbers, public way, and general contractors – should be posted on the City’s data portal.

One day left to enter the Divvy Data Challenge

Divvy dock post-polar vortex

Divvy bikes have been covered in snow frequently this winter. Photo by Jennifer Davis.

As self-proclaimed Divvy Data Brigade Captain* in Chicago’s #opendata and #opengov community I must tell you that all Divvy Data Challenge submissions are due tomorrow, Tuesday, March 11. Divvy posted:

Help us illustrate the answers to questions such as: Where are riders going? When are they going there? How far do they ride? What are top stations? What interesting usage patterns emerge? What can the data reveal about how Chicago gets around on Divvy?

We’re interested in infographics, maps, images, animations, or websites that can help answer questions and reveal patterns in Divvy usage. We’re looking for entries to tell us something new about these trips and show us what they look like.

I’ve seen a handful of the entries so far, including some to which I’ve contributed, and I’m impressed. When the deadline passes I’ll feature my favorites.

Want to play with the data? You should start with these resources, in order:

  1. Divvy Data Challenge – rules and data download
  2. divvy-munging – download an enhanced version of Divvy’s data, with input from several #ChiHackNight hackers
  3. Bike Sharing Data Hackpad – this is where I’m consolidating all of the links to projects, visualizations, analysis, data, and blog posts.
  4. Divvy Data Google Group – a discussion group with over 25 members
  5. #DivvyData – chat on Twitter

It’s not too late to get started now on a project about the bikes themselves. Nick Bennet has crunched the numbers on the bikes’ activity and posted them to the Divvy Data Google Group. Want to use his data and initial analysis? He said “run with it”.

Share your work ahead of time and leave a comment with a link to your project.

* This title is a play on Christopher Whitaker’s position as Code For America Brigade Captain and all around awesome-doer of keeping track of everything that’s going on in these communities and publishing event write-ups on Smart Chicago Collaborative.

How Chicagoans commute map: An interview with the cartographer

Chicago Commute Map by Transitized

A screenshot of the map showing Lakeview and the Brown, Red, Purple and Purple Line Express stations.

Shaun Jacobsen blogs at Transitized.com and yesterday published the How Chicagoans Commute map. I emailed him to get some more insight on why he made it, how, and what insights it tells about Chicago and transit. The map color-symbolizes census tracts based on the simple majority commuting transportation mode.

What got you started on it?

It was your post about the Census data and breaking it down by ZIP code to show people how many homes have cars. I’ve used that method a few times. The method of looking up each case each time it came up took too long, so this kind of puts it in one place.

What story did you want to tell?

I wanted to demonstrate that many households in the city don’t have any cars at all, and these residents need to be planned for as well. What I really liked was how the north side transit lines stuck out. Those clearly have an impact on how people commute, but I wonder what the cause is. Are the Red and Brown Lines really good lines (in people’s opinions) so they take them, or are people deciding to live closer to the lines because they want to use it (because they work downtown, for example)?

The reason I decided to post the map on Thursday was because while I was writing the story about a proposed development in Uptown and I wanted  information on how many people had cars around that development. As the map shows, almost all of Uptown is transit-commuting, and a lot of us don’t even own any cars.

What data and tools did you use?

I first used the Chicago Data Portal to grab the census tract boundaries. Then I grabbed all of the census data for B08141 (“means of transportation to work by number of vehicles available”) and DP04 (“selected housing characteristics”) for each tract and combined it using the tract ID and Excel’s VLOOKUP formula.

Read the rest of this interview on Web Map Academy.

Getting a little closer to understanding Chicago’s pothole-filling performance status

Tom Kompare updated his web application that tracks the progress of potholes based on information in the city’s data portal in response to my query about how many potholes the city fills within 72 hours, which is the Chicago Department of Transportation’s performance measure.

He wrote to me via the Open Government Chicago group:

Without completely rewriting http://potholes.311services.org, I added a count of the number of open (not yet addressed) pothole repair tickets (requests) that exceed 3 days old. As of today, the data from the City of Chicago’s Data Portal shows 1,334 or the 1,404 open tickets in the 311 system are older than three days.

Full disclosure: The web app actually looks for greater than 4 days old. The Data Portal’s pothole data are only updated once a day, so these data are always a day old. 4 – 1 = 3.

Keep in mind that this web app only shows how many are yet to be addressed, and does not count how many have been patched within CDOT’s 3-day goal during some arbitrary time period. That is a much more intense calculation that this pure client-side Javascript web application can handle due to bandwidth restrictions on mobile (3/4G). This web app already pushes the mobile envelope with the amount of data downloaded. I can fix that, but, again, not without a rewrite.

Still, 1,334 open repair requests (12/16/2013 Data Portal data) is quite different than the number of open repair requests reported by CDOT (560 in Alley, 193 on street) on 12/16/2013. I’m not sure what is the difference.

This reminds me of a third issue with the way CDOT is presenting pothole performance data online (the first being that it’s PDF, the second that it doesn’t work in Safari). The six PDF files are overwritten for every new day of data. If you want information from two days ago, well you better have downloaded the PDF from two days ago!

© 2014 Steven Can Plan

Theme by Anders NorenUp ↑