Data Source

The data here recreate the 2016 analysis from Vox, modeled off of the Washington State Department of Health’s analysis, to derive lead exposure risk scores. Scores are derived from tract-level Census Bureau data for “poverty status in the last 12 months” (Table S1701) and “year structure built” (Table B25034). The original analysis used 2014 ACS data, but my analysis here uses the most recent ACS data, from 2019.

How the Data Were Constructed

The key estimates in this dataset are a lead risk index and percentile score ranging from 1-10, with 1 indicating very little lead exposure risk (tract is in the bottom 10 percent for risk based on nation-wide ranking) and 10 indicating very high lead exposure risk (tract is in the top 10 percent for risk based on nation-wide ranking). These lead risk scores and ranks are based on the age of housing and poverty status, which have been shown to influence the risk of lead poisoning in children.[^1]

The lead exposure risk is estimated, first, as a function of the age of the housing stock times a nationally-derived estimate of the proportion of housing from each era with lead risks (see Jacobs et al (2002)):

  • \(Houses_{pre-1940} \times 0.68\)
  • \(Houses_{1940-1959} \times 0.43\)
  • \(Houses_{1960-1979} \times 0.08\)
  • \(Houses_{1980-1999} \times 0.03\)
  • \(Houses_{post-2000} \times 0.00\)

The resulting values are summed within census tracts and the sum is divided by the total number of households to generate housing risk.

Lead exposure risk is also higher in environments characterized by poverty. Poverty-induced risk is calculated by the the number of people under 125% of the poverty line divided by the total number of people within a tract.

The housing-induced risk and poverty-induced risk values are standardized to generate comparable scales and combined with additive weights to create the lead exposure risk score

\[lead\; exposure\; risk = housing\; risk \times 0.58 + poverty\; risk \times 0.42 \] The lead exposure risk score was ranked across all tracts in the US and each tract was assigned to a decline between 1 and 10. The data provided here is only for the Charlottesville area so that the risk rankings are relative to the nation as a whole.

Variable Descriptions

glimpse(leadrisk)
## Rows: 50
## Columns: 8
## $ GEOID             <chr> "51065020200", "51003010201", "51003010202", "510030…
## $ NAME              <chr> "Census Tract 202, Fluvanna County, Virginia", "Cens…
## $ countyfips        <chr> "065", "003", "003", "003", "003", "003", "003", "10…
## $ tractfips         <chr> "020200", "010201", "010202", "010401", "010402", "0…
## $ leadriskscore_raw <dbl> -0.14187013, -1.07303392, -0.79073840, -0.37250423, …
## $ lead_risk_rank    <dbl> 5, 1, 2, 4, 1, 3, 1, 2, 2, 3, 5, 5, 8, 10, 6, 5, 2, …
## $ housing_risk      <dbl> 24.440575, 6.057143, 11.327447, 18.447356, 5.575397,…
## $ poverty_risk      <dbl> 9.138218, 2.379958, 4.817329, 9.311255, 3.433333, 19…

Observations are census tract estimates of:

  • GEOID: 11-digit FIPS census tract code
  • NAME: Tract number, county, and state
  • countyfips: 3-digit county FIPS code
  • tractfips: 6-digit census tract FIPS code
  • leadriskscore_raw: Raw lead risk score (housing_risk*0.58 + poverty_risk*0.42)
  • lead_risk_rank: Lead risk rank on a scale of 1-10, with 1=very low lead risk and 10=very high lead risk
  • housing_risk: Estimated percent of houses that contain lead paint
  • poverty_risk: Percentage of people at or below 125% of the poverty line

Summaries

leadrisk %>% select(-c(GEOID:NAME)) %>% 
  select(where(~is.numeric(.x) && !is.na(.x))) %>% 
  as.data.frame() %>% 
  stargazer(., type = "text", title = "Summary Statistics", digits = 0,
            summary.stat = c("mean", "sd", "min", "median", "max"))
## 
## Summary Statistics
## ==============================================
## Statistic         Mean St. Dev. Min Median Max
## ----------------------------------------------
## leadriskscore_raw  -0     1     -1    -0    2 
## lead_risk_rank     4      3      1    4    10 
## housing_risk       13     8      2    11   33 
## poverty_risk       18     17     2    13   95 
## ----------------------------------------------

Visual Distributions

leadrisk %>% select(-c(NAME, countyfips, tractfips)) %>% 
  pivot_longer(-GEOID, names_to = "measure", values_to = "value") %>% 
  mutate(measure = factor(measure, levels = c("lead_risk_rank", "leadriskscore_raw", "housing_risk", "poverty_risk"))) %>% 
  ggplot(aes(x = value, fill = measure)) + 
  geom_histogram() + 
  facet_wrap(~measure, scales = "free") +
  xlab("Risk Scores") +
  guides(fill = "none")

Spatial Distributions

Lead Exposure Risk Rank

pal <- colorFactor("viridis", reverse = TRUE, domain = leadrisk$lead_risk_rank)

leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(lead_risk_rank),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Lead Risk Rank: ", leadrisk$lead_risk_rank)) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$lead_risk_rank,
            title = "Lead Risk Rank", opacity = 0.7)

Lead Exposure Risk Score

pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$leadriskscore_raw)

leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(leadriskscore_raw),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Raw Lead Risk Score: ", round(leadrisk$leadriskscore_raw, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$leadriskscore_raw,
            title = "Raw Lead Risk Score", opacity = 0.7)

Housing-induced Risk Score

pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$housing_risk)

leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(housing_risk),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Housing Risk: ", round(leadrisk$housing_risk, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$housing_risk,
            title = "Housing Risk", opacity = 0.7)

Poverty-induced Risk Score

pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$poverty_risk)

leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(poverty_risk),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Poverty Risk: ", round(leadrisk$poverty_risk, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$poverty_risk,
            title = "Poverty Risk", opacity = 0.7)

{^1]: There are other factors that influence lead exposure, but it is often hard to get good data on those factors.