Analysis of Census Bureau’s August 2022 Differential Privacy Demonstration Product: Implications for Data on Young Children

Executive Summary

The U.S. Census Bureau is using a new method called differential privacy (DP) to help protect the confidentiality and privacy of respondents in the 2020 Census. This paper provides some information on how the use of DP in the 2020 Census is likely to impact the accuracy of data for young children (population ages 0 to 4).

The study is based on analysis of the most recent DP Demonstration Product released by the Census Bureau on August 25, 2022. The DP Demonstration Product issued on August 25, 2022 supersedes earlier DP Demonstration Products and focuses on data that will be in the 2020 Census Demographic and Housing Characteristics (DHC) file, which is scheduled to be released in May 2023.

The DHC file has most of the tables that were in Summary File 1 of the 2010 Census.  The Demonstration Product released in August 2022 has data for population and housing units, but this analysis only examines data from the population file.

This paper presents analysis of the error introduced by DP by comparing the data as reported in the 2010 Census Summary File to the same data after the application of DP. According to the Census Bureau, the demonstration file released by the Census Bureau in August 2022 has been optimized for major use cases of the DHC tables.  

Analysis presented in this paper found little impact of DP on data for young children for large (highly aggregated) geographic units like states or large counties.   However, the story is different for smaller geographic units.  Many smaller areas have high levels of error in their data on young children after DP is applied. For example, the count of young children would exhibit absolute percent error of 5 percent or more in about 18  percent of Unified School Districts after DP is applied. The data also show that 64 percent of Unified School Districts had absolute numeric errors of 5 or more young children after DP is applied.

Errors of the magnitude shown above could have important implications for federal and state funding received by schools and for educational planning. Errors of this magnitude might impact formula funding that is based on Census-derived data and some schools would  get less than they deserve.

Bigger absolute percent  errors are evident for Hispanic, Black, and Asian young children in Unified School Districts.  The mean absolute percent error for Non-Hispanic White young children was 5 percent compared to 28 percent of Hispanic young children, 35 percent for Black young children, and 45 percent for Asian young children.  Differential accuracy among race and Hispanic Origin groups raises questions of data equity after DP is applied.

I also examined the accuracy/errors for the single year age 4 child population and found that errors for single year of age are particularly large.  I found 52  percent of Unified School Districts had absolute percent errors of 5 percent or more for children age 4, and  59 percent had absolute numeric errors of 5 or more children age 4

 The results are similar for Places. Analysis shows that 46 percent of Places (cities, village, and towns) had absolute percent errors of 5 percent or more for age 0 to 4, and 38 percent of Places had absolute numeric errors of 5 or more young children.   

   I believe the most important type of error introduced by the application of DP are the large errors introduced for some geographic units.  Analysis shows that 2 percent of Unified School Districts have Absolute Percent errors of 25 percent or more.  In terms of numeric errors, 5 percent of Unified School District have absolute numeric errors of 25 or more young children.  I urge the Census Bureau to take steps to reduce or eliminate these large errors for I believe the large errors injected by DP that will be most problematic.

The application of DP also caused a number of impossible or improbable results. After the injection of DP in the 2010 Census data included in the August 2022 Census Bureau Demonstration Product (U.S. Census Bureau 2022d Table 18), there were 163,077   blocks nationwide (1.5 percent of all blocks) that had population ages 0 to 17,  but no population ages 18 or over, compared to 82 such blocks before DP was applied  This result has two important implications. 

First, blocks with children and no adults are a highly implausible situation and the large number of such blocks may undermine confidence in the overall Census results.  

 Second, these implausible results are likely due to young children being separated from their parents in 2020 Census DHC processing with DP. This separation of children and parent in data processing is an ongoing concern for data on young children and the production of future tables for children.  This issue is particularly important in introducing DP into the American Community Survey, which is a key source of child well-being measures (O’Hare 2022b). To understand the well-being of children, it is critical to understand the situation of a child’s parents or caretakers.  Moreover, if the same separation of children from their  parents and caregivers occurs in the application of DP to the American Community Survey, it will eliminate reliable child poverty data which is based on household income. Child poverty rates are one of the most important  measures of child well-being.

Based on the errors for the young child population with the privacy parameters for DP used in the August 2022 DP Demonstration Product, and the lack of clarity about the level of privacy protection from DP, I recommend the Census Bureau take steps to reduce the size of errors injected into the 2020 Census DHC file and in particular focus on trimming or eliminating the number of large errors.

This paper is meant to provide stakeholders and child advocates with some fundamental information about the level of errors DP is likely to  inject into the 2020 Census data for the population ages 0 to 4. There are a couple of reasons for sharing this information with child advocates now.  The 2020 Census results for some localities may include situations where the number of young children reported looks suspect.  It is important to make sure child advocates are aware of the potential impact of DP so they can explain odd child statistics to local leaders.

There is a second reason for sharing this information with state and local child advocates. The U.S. Census Bureau is looking for feedback on the use of DP in the 2020 Census.  The Census Bureau is looking for cases where census data are used to make decisions and the Census Bureau is asking data users to examine the DP Demonstration Product to see if the error injected by DP make the data unfit for use.  After reading this report, I hope you will convey your thoughts to the Census Bureau.

There is some latitude in how much error the Census Bureau will inject into the DHC files so feedback from census data users is important. If many users feel the current level of precision for data on young children in DP Demonstration Product is not accurate enough for some uses, there is a chance the Census Bureau could make the final data more accurate.

Stakeholders, child advocates, and data users should take advantage of this opportunity to communicate their thoughts to the Census Bureau before Census Bureau’s Data Stewardship Advisory Committee makes a final decision on the privacy parameters to be used in the DHC file when it is released in May of 2023.  Comments on the implications of DP in the August 2022 Demonstration File are due September 26, 2022,  Comments and responses can be sent to