Background

Here, we provide some background information useful for non-technical audiences and first-time users of Census Data.

US Census Bureau

US Census Bureau Logo

The US Census Bureau provides current data on America’s people, places, and economy. Their mission is to serve as the nation’s leading provider of quality data about its people and economy. The Census Bureau works to honor privacy, protect confidentiality, share their expertise globally, and conduct their work openly.1

The American Community Survey (ACS)

This documentation will focus on the American Community Survey (ACS), an ongoing annual survey that shows what the US population looks like and how it lives.2 The ACS is a tool that can help communities decide where to target services and resources by helping them understand the changes happening in their communities. It is an important source of detailed population and housing information.

The US Census ACS

The ACS releases new data each year in the form of estimates. These estimates are presented in a variety of tables, tools, and analytical reports. The ACS data can be accessed through the Census Bureau’s platform, data.census.gov. The ACS has 1-year3 and 5-year estimates of the data.4 It is important to consider currency and sample size/reliability/precision in choosing which dataset to use. The table “Distinguishing features of ACS 1-year, 1-year supplemental, 3-year, and 5-year estimates”5 summarizes details, research implications, and examples to help guide data users.

For the purposes of this documentation, we will focus on accessing ACS 5-year estimates.

Understanding ACS Tables

For the scope of this project, we will be focusing on the following table codes. You can find more documentation around ACS table codes here.

ACS Demographic and Housing Characteristics [DP05]

Questions regarding race/ethnicity

The DP05 table provides detailed information on population estimates by race and ethnicity. The data profile (DP) tables provide a lot of popular statistics across subjects - in this case, demographic and housing characteristics - in a single table. We are particularly interested in counts and percentages of various racial and ethnic groups in a given area. The US Census defines race here as a subset based on responses to the race or Hispanic/Latino-origin questions.

These provide the following possible categories:

Table Codes Race/Ethnicity Label
A White alone
B Black or African American Alone
C American Indian and Alaska Native Alone
D Asian Alone
E Native Hawaiian and Other Pacific Islander Alone
F Some Other Race Alone
G Two or More Races
H White Alone, Not Hispanic or Latino
I Hispanic or Latino

Out of these nine groups, we consistently focus on the following six across all variables when stratifying by race/ethnicity, which helps prevent overlap in our race/ethnicity categories:

B. Black or African American Alone

C. American Indian and Alaska Native Alone

D. Asian Alone

E. Native Hawaiian and Other Pacific Islander Alone

H. White Alone, Not Hispanic or Latino

I. Hispanic or Latino

Sex by Age [B01001A-I tables]

The B01001 table provides sex by age - these are the estimated counts of men and women in age buckets, mostly of five years (e.g. 25 to 29 years, 30 to 34 years). We focus on this table to get race and ethnicity (%) estimates of children. Certain age groups - including the later teens and the 60’s - have brackets of differing sizes; there are also columns for just 20-year-olds and for just 21-year-olds.

The letters A through I provide versions of this table by race and ethnicity, these letter codes follow the same race/ethnicity codes discussed earlier. For example, B01001A provides sex by age for White alone and B01001B provides sex by age for Black or African American alone. Aggregating by sex and age for each of these tables provides the number of children by each race and ethnicity. This can be useful for understanding the demographic composition of children within a community. You can find more detailed information on the table here.

Questions regarding race/ethnicity

Poverty Status in the Past 12 Months [S1701]

Questions regarding poverty status

The S1701 table provides data on poverty rates, including the number and percentage of individuals and families living below the poverty line. The ACS is the only source for official poverty data down to the community level, making it invaluable as the official poverty measure has direct policy implications, such as the allocation of federal funds to help low income individuals and families at the state and locality levels. The S1701 table provides poverty status in the past 12 months by age, sex, race and ethnicity, educational attainment, employment status, and work experience. The S1702 table provides poverty status in the past 12 months of families. Other tables include B17001 (poverty status in the past 12 months by sex by age), B18131 (age by ratio of income to poverty level in the past 12 months by disability status and type), and B17022 (ratio of income to poverty level in the past 12 months of families by family type by presence of related children under 18 years by age of related children).

While people colloquially refer to the poverty level as a singular concept, in reality, the Census Bureau uses a set of poverty thresholds to determine who is in poverty. Forty-eight different poverty income levels are computed and these values are determined based mostly on family size, and, for 1-2 person households, with separate numbers for senior citizens. Only income before taxes is included and it does not include non-cash benefits such as public housing, Medicaid, and food stamps. Poverty is also not calculated for households, rather the whole household is assigned the poverty status of the person who filled out the form. Poverty is also not determined for people living in group quarters such as prisons, college dormitories, military quarters, and nursing homes. It is also not determined for homeless people unless they are living in shelters. It also does not include children under 15 who are not living with their families such as foster children, children living with non-relatives or on their own.

Note: same-sex married couples were not treated as married by the Census until 2013, even if legally married. Only the income of the householder and any of their children were included in the poverty calculations until that point.

More information on how poverty is determined by the Census Bureau can be found at How the Census Bureau Measures Poverty6.

Household Income [B19001]

Most income data is reported at the household level, but there are tables which are focused on family households, as well as some which report on individual income. Income is reported in a number of ways: as a median value or as a series of medians for groups within a total population; as a number of people or households earning in a certain bracket; as well as a few other structures. In order to calculate median household income, we consider the B19001 tables, which provide frequency counts for the number of households which have median incomes that fall within each of the income brackets below:

  • Less than $10,000
  • $10,000 to $14,999
  • $15,000 to $19,999
  • $20,000 to $24,999
  • $25,000 to $29,999
  • $30,000 to $34,999
  • $35,000 to $39,999
  • $40,000 to $44,999
  • $45,000 to $49,999
  • $50,000 to $59,999
  • $60,000 to $74,999
  • $75,000 to $99,999
  • $100,000 to $124,999
  • $125,000 to $149,999
  • $150,000 to $199,999
  • $200,000 or more

We cannot aggregate medians the same way we do for other measures; instead, we use the following guide to calculate an approximate median based on frequency of median income brackets7.

When considering these tables, it is important to note that each race/ethnicity group is given its own sub-table. The letters A through I provide versions of this table by race and ethnicity. These letter codes follow the same race/ethnicity codes discussed earlier. In order to gain the median household income, we will have to pull in each of the sub-tables for our race/ethnicity of interest and then join them together. You can find more detailed information on the table here.

Questions regarding income

Selected Characteristics of Health Insurance Coverage in the United States [S2701]

Questions regarding insurance

The S2701 table offers insights into the number and percentage of individuals without health insurance coverage. This can be crucial for understanding healthcare access within a community. The American Community Survey added questions about health insurance in 2008. The data shows health insurance coverage broken up by the following categories: age, sex, race, living arrangement, citizenship status, disability status, educational attainment, employment status, and household income. Edits were made to the health care questions in 2009 that make direct comparison with the first year responses impossible without adjustments. In 2012, tables were added to count health insurance coverage for people aged 19 to 25.

Health Insurance Coverage Status by Sex by Age [B27001]

The B27001 table specifically focuses on children without health insurance, providing detailed information on their demographic characteristics and uninsured rates. The American Community Survey (ACS) includes questions regarding health insurance status, which were added in 2008 and have been updated in the following years. The B27001 table series provides fine-grained information about general health insurance coverage, broken down into nine age groups and nine race/ethnicity groups.

Note: For broader age categories, consider table C27001, which contains three age groups (child, adult, senior citizen). For other insurance-related information, consider tables B27002 - B27023.

When considering these tables, it is important to note that all data is disaggregated by age and each race/ethnicity group is given its own sub-table. The letters A through I provide versions of this table by race and ethnicity. These letter codes follow the same race/ethnicity codes discussed earlier. In order to gain the counts of Children without Health Insurance, we will have to pull in each of the sub-tables for our race/ethnicity of interest and then combine them. You can find more detailed information on the table here.

Questions regarding insurance

Employment Status [S2301]

Questions regarding employment status

The ACS provides data on employment for many geographic levels and localities, including their employment, retirement, occupation, and industry. The Census Bureau asks a number of questions about whether individuals are employed and, if they are employed and not retired, about their occupation and industry. Census Reporter collects these together under the tag ‘employment’. When talking about employment status, the Census Bureau divides the population 16 years and older into two categories: in the labor force or not in the labor force. People who have never worked or who are retired are not in the labor force. People who are not currently working but have recently and would like to work are considered in the labor force, but unemployed. People who are actively working are described as either in the civilian labor force or in the armed forces.

Note: The ACS may not be the best source for employment information depending on your needs. Consider the Current Population Survey (CPS), which provides employment statistics in monthly increments and is better suited for longitudinal analysis. The state-partnered Longitudinal Employer–Household Dynamics (LEHD) is another option.

Demographic Characteristics for Occupied Housing Units [S2502]

The American Community Survey (ACS) gathers extensive data about the housing conditions of respondents, including whether they own or rent their home, how much they spend on housing, and the physical characteristics of homes. The ACS primarily reports housing data in tables with codes beginning with 25. Most of the tables count the number of housing units for a given characteristic. However, a few tables estimate the number of people living in owned or rented housing units. A housing unit is anything from a house to an apartment or even a boat if a person is currently living there.

Homeownership percentage is a crucial indicator of housing stability and economic well-being in a community. The S2502 table in the ACS provides valuable information on homeownership rates within different geographical areas and for different race/ethnicity groups.

Questions regarding housing status

First Steps with Coding

Now, we move on to the technical aspect of the walkthrough, examining the tools and packages in R that allow us to pull data from the census and how we can leverage those tools to provide the information we need.

Introduction to R and R-Studio

R Logo

RStudio Logo

R is a programming language and software environment for statistical computing and graphics. It serves as an integrated suite of software facilities for data manipulation, calculation and graphical display, including effective data handling, operators for calculations on arrays/matrices, and a collection of intermediate data analysis tools. R is an effective programming language widely used among statisticians and data miners for data analysis and developing statistical software. More information about R and links for installation can be found at their website.8

RStudio is an integrated development environment (IDE) for R.9 IDEs are developer environments that provide better tools and UI for working in R. It provides a user-friendly interface to work with R, making it easier to write and run R code, view plots and charts, manage files, and more.

Both must be installed if this is the first time you are working with R, there are numerous guides walking you through the process.1011 If you already have R/RStudio installed, you can skip this section.

Introduction to tidycensus Package

tidycensus is an R package designed to help R users get Census data pre-prepared to use with tidyverse.12 tidycensus is an R package that allows users to interface with a select number of US Census Bureau’s data APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.

Tidycensus Logo

To install the package from an R environment, type and run the following command once:

# Only run ONCE (first time using package on local system)
install.packages("tidycensus")

After the first time installing the package, any future scripts that reference the package can have this line of code:

# Run at the start each time working with R script
library(tidycensus)

This will be enough to load the tidycensus package. You may also want to run library(tidycensus) at the start to load in a common domain-general R package for doing data work.

Getting Census API Key

To access ACS data programmatically, you will need to obtain an API key from the Census Bureau. This key allows you to make requests to the Census API and retrieve data directly into your R environment. To gain access to your own API key, access the website here.13 This site will require an organization name (or personal name of individual) and an email address. You can load the API key into R by running the following code in R:

# Assign the census API key to an object named "census_api"
census_api <- "f1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0t1u2v3w4x5y6z" # NOT REAL KEY

NOTE: Your API key is private information, take care to not share the string value or make it publicly available

Finding Other Variables of Interest

The variables we described earlier were the focus of this project but there may be scenarios where you are interested in pulling your own case-specific variables.

Helper Function: Pulling All MetaData

To find table codes in R, we can create a custom function in R that can combine all the metadata of interest into a table we can search through. The function is shown here:

# A custom R function that creates a table of all variable codes and metadata
all_acs_meta <- function(year){
  
  # Gets the list of all variables from all acs5 metadata tables
  vars1 <- load_variables(year, "acs5") %>% select(-geography) # Remove the geography column
  vars2 <- load_variables(year, "acs5/profile")
  vars3 <- load_variables(year, "acs5/subject")
  vars4 <- load_variables(year, "acs5/cprofile")

  # Provides column with specific lookup
  vars1$dataset_table <- "acs5"
  vars2$dataset_table  <- "acs5/profile"
  vars3$dataset_table  <- "acs5/subject"
  vars4$dataset_table  <- "acs5/cprofile"

  # Combine all table rows
  all_vars_meta <- rbind(vars1, vars2, vars3, vars4)

  return(all_vars_meta)
}

As we can see, the variable metadata being loaded in the function above only applies to the 5-year ACS results (acs5). If you are working with the 1-year results, the names would have to be changed (in that case, to acs1). After running the code above, you can enter the following command and provide a year as shown below:

# Creates a table of all the metadata called "meta_table"
meta_table <- all_acs_meta(year = 2021)

# Opens the newly made table
View(meta_table)

Here, you can see the result of the first 5 lines for the table:

name label concept dataset_table
B01001A_001 Estimate!!Total: SEX BY AGE (WHITE ALONE) acs5
B01001A_002 Estimate!!Total:!!Male: SEX BY AGE (WHITE ALONE) acs5
B01001A_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE (WHITE ALONE) acs5
B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE (WHITE ALONE) acs5
B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE (WHITE ALONE) acs5
B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE (WHITE ALONE) acs5

This will allow you to view all the table codes and search for specific codes. There are also descriptions for each variable code and what they are measuring.

Meta Table

Validating API Pulls Using Census Website

Once you get ACS data in R, it’s important to validate the data to ensure the numbers are what you intended to pull. An easy way to do this is by accessing the data from Census Data Tables.14 Search for the specific table code and narrow down to the locality you are looking for using the filter options on the left. Don’t forget to specify the ACS estimate you are looking for.

Meta Table

Clicking on the “American Community Survey” will open up the following, allowing you to select which estimate you are looking for.

Meta Table

Cross-compare the numbers here from the ones in R to ensure they match.

Aggregating Across Counties

When pulling Census data at the county-level using tidycensus, we can only pull data for individual counties as the package contains no way to aggregate values across a range of counties. In order to do this, we must pull the data in for each county of interest, then manually aggregate the values across the counties.

Note: The example included in this section works well for table DP05 and other tables containing count data; however, counts, medians, and rates require different methods to aggregate across counties and care should be taken to ensure the correct aggregation method is applied for the respective metrics.

County Aggregation Example

We begin by determining what our combined area of interest for the ACS data is and where it can be found. In this case, we are interested in results from the year of 2022 for the combined counties of Charlottesville, VA and Albemarle, VA. We define these variables below:

# Census API key
census_api <- Sys.getenv("CENSUS_API_KEY")

# Year for acs5 data pull
year <- 2022

# County FIP codes and name
county_codes <- c("003", "540") # locality FIPS codes desired
name <- "Charlottesville-Albermarle" # name of locality or combined region

We use the get_acs() function from tidycensus to pull the example variable, Race and Ethnicity from the ACS Demographic and Housing Characteristics tables [DP05]. Here is the code that achieves this result:

# Create list mapping our variable
var_DP05 <- list(
  "RaceEthnic_%_Black" = "DP05_0038", # Black/AA
  "RaceEthnic_%_AmerIndian" = "DP05_0039", # American Indian/Alaska Native
  "RaceEthnic_%_Asian" = "DP05_0044", # Asian
  "RaceEthnic_%_PacifIslan" = "DP05_0052", # Native Hawaiian/Pacific Islander
  "RaceEthnic_%_HispanLatin" = "DP05_0073", # Hispanic/Latino (any race)
  "RaceEthnic_%_White" = "DP05_0079" # White alone (not Hispanic/Latino)
)

# Get ACS data
acs_data_DP05 <- get_acs(geography = "county",
                    state = "VA",
                    county = county_codes,
                    variables = var_DP05,
                    summary_var = "DP05_0033", # this provides the total (summary table) we need for creating percents
                    year = year, 
                    survey = "acs5",
                    key = census_api
                    ) 

We first create a dictionary that maps the variable code in the ACS tables to the appropriate category label. For example, the code “DP05_0038” is tied to the count of “Black or African-American alone” race/ethnicity respondents.

From there, call the get_acs() function. The arguments provided as function inputs are the following:

  • geography = gives the level of geography; this can be national, state, county, etc.
  • state = here we specify the particular state we are considering
  • county = this is provided with a vector list (in this case, called county_codes) of the FIPS codes for the counties we want to pull data from
  • summary_var = allows us to define the variable code for the summary statistics; by including totals, we can calculate percentages and proportions from the counts for each group
  • year = sets the time period of the survey we want the values from
  • survey = defines the type of ACS survey we are interested; here we want the 5-year ACS values (i.e. “acs5”)
  • key = this is where we provide the necessary Census API key needed to make calls

We can see the table output here:

GEOID NAME variable estimate moe summary_est summary_moe
51003 Albemarle County, Virginia RaceEthnic_%_Black 9966 419 112513 NA
51003 Albemarle County, Virginia RaceEthnic_%_AmerIndian 125 86 112513 NA
51003 Albemarle County, Virginia RaceEthnic_%_Asian 6319 267 112513 NA
51003 Albemarle County, Virginia RaceEthnic_%_PacifIslan 34 48 112513 NA
51003 Albemarle County, Virginia RaceEthnic_%_HispanLatin 6633 NA 112513 NA
51003 Albemarle County, Virginia RaceEthnic_%_White 85372 282 112513 NA
51540 Charlottesville city, Virginia RaceEthnic_%_Black 7945 350 46289 NA
51540 Charlottesville city, Virginia RaceEthnic_%_AmerIndian 70 42 46289 NA
51540 Charlottesville city, Virginia RaceEthnic_%_Asian 3237 201 46289 NA
51540 Charlottesville city, Virginia RaceEthnic_%_PacifIslan 0 28 46289 NA
51540 Charlottesville city, Virginia RaceEthnic_%_HispanLatin 2705 NA 46289 NA
51540 Charlottesville city, Virginia RaceEthnic_%_White 30208 226 46289 NA

Now that we have two rows for each race/ethnicity group, one for the two counties we are interested in aggregating, we need to group by the race/ethnicity categories and summarize the values. This can be done using the dplyr package within tidyverse for the group_by() function, and a number of functions for summarizing the ACS data that come with the tidycensus package. Here is the aggregating code for our example:

# Create summary variables for the combined counties
# Run for single locality, will lead to no changes in the data table
acs_data_DP05_summarize <- acs_data_DP05 %>% 
  group_by(variable) %>% 
  summarize(sum_est = sum(estimate),
            sum_moe = moe_sum(moe = moe, estimate = estimate),
            sum_all = sum(summary_est))

# Create percentages from estimates
acs_data_DP05_summarize <- acs_data_DP05_summarize %>% 
  mutate(value = round(((sum_est / sum_all) * 100), digits = 2),
         name = name) %>% 
  select(name, variable, value)

We are grouping by the variable column, which contains the race/ethnicity breakdown for the variable. This will aggregate the two counties together for each race/ethnicity. Then, with the summarize() function we are calculating the sums for each of the count values, including the estimated counts for each race/ethnicity category, and the sum of the total counts in each respective county. For the margin of error (moe), we can utilize a special tidycensus function called moe_sum(), which takes two arguments, the moe value and its respective point estimate.

In the next code chunk, we convert the summed counts into percentages by dividing the race/ethnicity counts by the total counts to get the proportion and multiplying by 100. The name of the combined counties is added as a column for clarity.

Running all this provides the following output:

name variable value
Charlottesville-Albermarle RaceEthnic_%_AmerIndian 0.12
Charlottesville-Albermarle RaceEthnic_%_Asian 6.02
Charlottesville-Albermarle RaceEthnic_%_Black 11.28
Charlottesville-Albermarle RaceEthnic_%_HispanLatin 5.88
Charlottesville-Albermarle RaceEthnic_%_PacifIslan 0.02
Charlottesville-Albermarle RaceEthnic_%_White 72.78

Helpful Resources for ACS Work

Below are some resources which are helpful for exploring Census ACS data:


  1. https://www.census.gov/about/what/census-at-a-glance.html↩︎

  2. https://www.census.gov/programs-surveys/acs.html↩︎

  3. https://www.census.gov/data/developers/data-sets/acs-1year.html↩︎

  4. https://www.census.gov/data/developers/data-sets/acs-5year.html↩︎

  5. https://www.census.gov/programs-surveys/acs/guidance/estimates.html↩︎

  6. https://www.census.gov/programs-surveys/acs/guidance/estimates.html↩︎

  7. https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/How_to_Recalculate_a_Median.pdf↩︎

  8. https://www.r-project.org/↩︎

  9. https://posit.co/download/rstudio-desktop/↩︎

  10. https://rstudio-education.github.io/hopr/starting.html↩︎

  11. https://www.stat.colostate.edu/~jah/talks_public_html/isec2020/installRStudio.html↩︎

  12. https://walker-data.com/tidycensus/↩︎

  13. https://api.census.gov/data/key_signup.html↩︎

  14. https://data.census.gov↩︎