Iron Viz Geospatial: Integrating Spatial Analysis in QGIS with Tableau

For the Iron Viz Geospatial contest I decided to tackle a project that’s been in the back of my mind for a while: the Environmental Protection Agency’s (EPA) Superfund sites. You can learn about what Superfund sites are in the viz below, or, if you want a lot more detail, at the EPA website. Since a good portion of my work was behind the scenes, I thought I’d focus the blog on obtaining the data and doing spatial analysis of the shapefiles in QGIS.

With the demise of the EPA on a lot of people’s minds, I wasn’t sure that I’d be able to find the data I wanted. I did eventually find it, but they did not make it easy. When you navigate to the  EPA Environmental Dataset Gateway and search “superfund” you find a lot of records. Some are links to websites, some are region specific, and some are site specific. Since I was interested in New Jersey, I tried to filter the results by searching for “Superfund Region 2.” This gave me a shorter list of results, one in particular that seemed promising: “EPA Region 2 Draft NPL Site Contamination Area Boundaries as of February 2007 GIS Layer [EPA.R2_NPL_CONTAMBND].” Unfortunately, when I clicked open, the link just redirected to the EPA Environmental Dataset Gateway. I opened the details webpage and found that the download link was in fact the Environmental Dataset Gateway homepage. I searched the website again, but it appears that the dataset isn’t available anymore. The same thing happened for all of the shapefiles for Region 2. I went back to my general “superfund” search and eventually found my way to the Geospatial Data Download Service. On this page I found a geodatabase that included point locations for every facility registered with the EPA (a 572 MB zipped file!).

I opened QGIS (which is my go-to GIS software) and loaded the geodatabase and waited while it loaded all 4,246,792 data points. The data points by themselves outlined the US and its territories. Since I didn’t need all 4+ million sites, I filtered the data by selecting only sites that were listed as Superfund sites on the National Priority List (NPL) (the ones that are priority to clean up). I saved this selection as a shapefile for use on the national scale. I then filtered the smaller Superfund NPL dataset for sites in New Jersey. I had a bit of cleaning up to do, as there were a few sites mistakenly listed in the state of New Jersey.

All registered EPA facilities in QGIS

I started another QGIS project file and added my NJ census tract shapefile (conveniently available here) and my NJ NPL Superfund sites. The census file was in a coordinate system with units in feet, and the Superfund sites were in degrees, so I needed to reproject one of them to use them together. Since I wanted to create buffers and feet are easier to use for that, I chose to reproject the Superfund shapefile.

NJ census shapefile and reprojected Superfund sites

I had done some research on how living near these Superfund sites affects human health, and decided to investigate how many people lived within a quarter mile of these Superfund sites. I used the buffer tool in QGIS to create circular polygons around each site with a radius of 1320 feet. I then used the clip tool to clip the census tracts to the extent of the buffers.

NJ census shapefile clipped to Superfund buffers

I opened the attribute table for my newly created clip layer and added a new calculated field to predict how many people lived within the clipped portion of each census tract polygon. I did this by finding the proportion of the clipped area to the original area of the census tract and multiplying it by the total population. It’s not going to be exact, but it will give an estimate of how many people live near these polluted sites. I did the same for women of childbearing age (18-40 for my calculation) and children (aged 0-17).

Calculation to estimate population near Superfund sites

Finally, I needed to join the census buffer layer to the Superfund NPL site layer. I did this using JoinAttributes by Location, specifying my buffer and clipped census shapefiles and that I wanted to sum of intersecting features. Since the output matched the shape of the buffer, and I wanted points for my Tableau viz, I applied the geometry tool Polygon Centroids to get my data back into point format.

Joining clipped census tract with buffers

So, finally I was ready to head over to Tableau and build the viz. What resulted is below:

(Click image to interact on Tableau Public)
Click to view on Tableau Public

Is the 9th circuit really the most overturned court of appeals by the US Supreme Court?

This project was done for the KDnuggets Data Science vs. Fake News contest.

In February of 2017, President Donald Trump signed an executive order that blocked immigration and travel from seven countries in Africa and the Middle East. This executive order was met with a lot of resistance. Several states sued the Trump administration, and a federal trial court judge ruled that the government could not carry out the executive order while the trial was pending. The Trump administration appealed this court decision to the 9th Circuit Court of Appeals, which affirmed the initial decision. On February 9, 2017, conservative news commentator Sean Hannity stated that he has been “predicting for days now the 9th circuit, the most liberal court of appeals, the most overturned court in the country — it would act this way.”

Debate ensued in the news media over whether the 9th Circuit Court of Appeals was the most overturned by the Supreme Court. The 9th Circuit encompasses a larger area than each of the other eleven courts of appeals, resulting in a larger overall caseload. Cases in the 9th Circuit make up about one fifth (20%) of all court of appeals cases nationwide.

This larger caseload results in the Supreme Court taking more cases from the 9th circuit, and, in turn, the Supreme Court reverses more decisions from the 9th Circuit. While the 9th Circuit has the most reversal decisions by the Supreme Court of the US from 2000 to 2015, the 6th and 8th circuit courts have higher rates of reversals.

Is the 9th Circuit Court of Appeals really the most liberal court of appeals, as Sean Hannity claimed? From 2000 to 2008, during George W. Bush’s term in office, the Supreme Court consisted of 7 conservative justices and 2 liberal justices. During Barack Obama’s term in office, two conservative justices were replaced with two liberal justices, leaving a more even split. It follows that the more conservative Bush-era Supreme Court would reverse more decisions from a liberal court of appeals than would the more evenly split Obama-era supreme court. Supreme Court reversal rates of circuit court cases do vary considerably by presidential term. The reversal rate of the 9th Circuit decreased about 6% from Bush’s presidential term to Obama’s presidential term. However, other circuit courts saw much more variability in their reversal rates between the two terms. For example, the 10th Circuit saw a drop of over 40% in their reversal rate, and the 11th Circuit saw an increase of about 23%. Relative to other courts of appeals, the Supreme Court’s reversal rate of the 9th Circuit is relatively stable.

From this data, it certainly does not appear that the 9th Circuit Court of Appeals is at odds with the Supreme Court. While the Supreme Court does reverse more cases from the 9th circuit than any other circuit, the rate at which they do so is on par with other circuit courts. Additionally, the reversal rate of the 9th Circuit has not varied significantly as the ideological makeup of the Supreme Court has changed, suggesting that it is not strongly liberal.

Data Sources

Hacking Open Government Data for the Tableau Public Hackathon

In February of 2017, Tableau Public announced a hackathon to make open datasets approachable. They paired
applicants into “Data Duos” to work in remote pairs on the various teams. I signed up for the government team (#VizzingGov). I was paired with a fellow data-enthusiast named Neil Lord to work on this project.

Early on, we narrowed our topic to two choices: the impacts of refugees and the costs of social care. We chose to go with refugee impacts, as it was a very timely topic following the recent Brexit vote in the UK and US presidential election. We decided on a two-fold approach to look at refugee data in a global scope, and then focus more locally on the impacts to a single country. Germany, the UK, and the US were the front runners for the local scale our project. We divided up the research, with Neil working on the global scale and myself working on the local scale.

I soon learned that the data available on refugees was voluminous and confusing. One country might consider a person a refugee, and another might consider them an asylee. The number of refugees collected by the UNHCR might not match the number reported in a country’s statistics. Some countries only reported the number of refugees and nothing more, which doesn’t make for very good story-telling. In the end, I chose to focus on the United Kingdom because they had the most recent most thorough data. I also ended up learning a lot about the programs that the UK has in place for refugee support and resettlement.

The final product we made is something I am very proud of. As a self-taught Tableau user, it was particularly interesting to see how someone else puts together a worksheet and a dashboard, and I learned a lot from working with Neil. The opportunity to work with a partner was an especially good experience for someone like myself who is trying to break in to the field of data analytics and visualization.

Anyway, I’m guessing you’re dying to see this visualization by now!

(Click image to interact on Tableau Public)
Click to view on Tableau Public

The Gender Gap in Postsecondary Education

Having spent so much time in postsecondary/college institutions, I was curious what the gender gap in employment rates and salaries was. While it has been improving over the years, there is still a long way to go.

Data Sources

  • – National Center for Education Statistics, via data.world here
    and here

(Click image to interact on Tableau Public)
Click to view on Tableau Public