Iron Viz Geospatial: Integrating Spatial Analysis in QGIS with Tableau

For the Iron Viz Geospatial contest I decided to tackle a project¬†that’s been in the back of my mind for a while: the Environmental Protection Agency’s (EPA) Superfund sites. You can learn about what Superfund sites are in the viz below, or, if you want a lot more detail, at the EPA website. Since a good portion of my work was behind the scenes, I thought I’d focus the blog on obtaining the data and doing spatial analysis of the shapefiles in QGIS.

With the demise of the EPA on a lot of people’s minds, I wasn’t sure that I’d be able to find the data I wanted. I did eventually find it, but they did not make it easy. When you navigate to the ¬†EPA Environmental Dataset Gateway and search “superfund” you find a lot of records. Some are links to websites, some are region specific, and some are site specific. Since I was interested in New Jersey, I tried to filter the results by searching for “Superfund Region 2.” This gave me a shorter list of results, one in particular that seemed promising: “EPA Region 2 Draft NPL Site Contamination Area Boundaries as of February 2007 GIS Layer [EPA.R2_NPL_CONTAMBND].” Unfortunately, when I clicked open, the link just redirected to the EPA Environmental Dataset Gateway. I opened the details webpage and found that the download link was in fact the Environmental Dataset Gateway homepage. I searched the website again, but it appears that the dataset isn’t available anymore. The same thing happened for all of the shapefiles for Region 2. I went back to my general “superfund” search and eventually found my way to the Geospatial Data Download Service. On this page I found a geodatabase that included point locations for every facility registered with the EPA (a 572 MB zipped file!).

I opened QGIS (which is my go-to GIS software) and loaded the geodatabase and waited while it loaded all 4,246,792 data points. The data points by themselves outlined the US and its territories. Since I didn’t need all 4+ million sites, I filtered the data by selecting only sites that were listed as Superfund sites on the National Priority List (NPL) (the ones that are priority to clean up). I saved this selection as a shapefile for use on the national scale. I then filtered the smaller Superfund NPL dataset for sites in New Jersey. I had a bit of cleaning up to do, as there were a few sites mistakenly listed in the state of New Jersey.

All registered EPA facilities in QGIS

I started another QGIS project file and added my NJ census tract shapefile (conveniently available here) and my NJ NPL Superfund sites. The census file was in a coordinate system with units in feet, and the Superfund sites were in degrees, so I needed to reproject one of them to use them together. Since I wanted to create buffers and feet are easier to use for that, I chose to reproject the Superfund shapefile.

NJ census shapefile and reprojected Superfund sites

I had done some research on how living near these Superfund sites affects human health, and decided to investigate how many people lived within a quarter mile of these Superfund sites. I used the buffer tool in QGIS to create circular polygons around each site with a radius of 1320 feet. I then used the clip tool to clip the census tracts to the extent of the buffers.

NJ census shapefile clipped to Superfund buffers

I opened the attribute table for my newly created clip layer and added a new calculated field to predict how many people lived within the clipped portion of each census tract polygon. I did this by finding the proportion of the clipped area to the original area of the census tract and multiplying it by the total population. It’s not going to be exact, but it will give an estimate of how many people live near these polluted sites. I did the same for women of childbearing age (18-40 for my calculation) and children (aged 0-17).

Calculation to estimate population near Superfund sites

Finally, I needed to join the census buffer layer to the Superfund NPL site layer. I did this using JoinAttributes by Location, specifying my buffer and clipped census shapefiles and that I wanted to sum of intersecting features. Since the output matched the shape of the buffer, and I wanted points for my Tableau viz, I applied the geometry tool Polygon Centroids to get my data back into point format.

Joining clipped census tract with buffers

So, finally I was ready to head over to Tableau and build the viz. What resulted is below:

(Click image to interact on Tableau Public)
Click to view on Tableau Public