Analysis of Census Bureau’s August 2021 Differential Privacy Demonstration Product: Implications for Data on Children

The U.S. Census Bureau is using a new method called differential privacy (DP) to help protect confidentiality and privacy of respondents in the 2020 Census. This paper provides some information on how DP is likely to impact the accuracy of data for children (population ages 0 to 17) in the 2020 Census. The study is based on analysis of the most recent DP Demonstration Product, which was released by the Census Bureau on August 12, 2021.  The DP demonstration product issued in August 2021 supersedes earlier DP demonstration products and uses the same DP parameters as those used in the production of the redistricting data (this file is sometimes called the Public Law – PL – 94-171 file) also issued by the Census Bureau on August 12, 2021.

To be clear, decisions about the use of DP in the redistricting data have been made so this analysis will have no impact on those decisions or that data. In June of 2021, Census Bureau leadership determined the final accuracy parameters for the redistricting data (P.L. 94-171) after a long process which included a series of demonstration products and user feedback.

Like all disclosure avoidance systems, the use of DP involves a trade-off between privacy protection and census accuracy.   There have always been errors in the Census data, but in the 2020 Census, the Census Bureau is adding additional error in order to enhance privacy protection.  The Census Bureau has control over the level of accuracy and level of privacy protection in the 2020 Census largely by changing a parameter called “epsilon.” Increasing the level of epsilon will increase accuracy in most cases, but an increase in epsilon will also lower the level of privacy protection.  After release of the April 2021 demonstration product, many stakeholders, including the Count All Kids Campaign, urged the Census Bureau to make data more accurate. Subsequently the Census Bureau responded to those requests and set epsilon at 19.61 for the redistricting file to increase the accuracy of the figures.

The August 2021, demonstration product file applied DP to 2010 Census data thus allowing comparisons of data with and without DP. Analysis presented in this paper found little impact of DP for large (highly aggregated) geographic units like states or large counties.   However, the story is different for smaller places.  Many smaller areas have high levels of error related to the number of children. For example, based on analysis of the August 2021 DP file, the count of children would exhibit absolute percent error of 5 percent or more in about 7 percent of Unified School Districts after DP is applied.  Bigger absolute error percentages are evident for several minority child populations.  The data also show than 61 percent of Unified School Districts had absolute numeric errors of 10 or more children and 7 percent of Unified School Districts have errors of 50 or more children.  Errors of this magnitude could have implications for federal and state funding received by schools and for educational planning.  More analysis is needed on this point.

The injection of DP, in the 2010 Census data included in the August  Demonstration Product, resulted in there being are over 160,000 blocks nationwide that had population ages 0 to 17, but no population ages 18 or over. In the data without DP injected, there were only a few hundred such blocks nationwide.  Blocks with children and no adults is a highly implausible situation and the large number of such blocks in the 2020 Census may undermine confidence in the overall Census results.  These implausible results are likely due to children being separated from their parents in DP processing. This separation of children from parents in the data processing is an ongoing concern for data on children.

The negative implications of DP for small areas and small populations are important because DP will be used in the remaining 2020 Census data files including Demographic Profiles file, the Demographic and Housing Characteristic (DHC) file, and the Detailed-Demographic and Housing Characteristics (D-DHC) file. Those files will provide data for smaller population groups, such as children ages 0 to 4 by race.  Given the larger impact of DP for smaller groups, it is important to monitor the quality of data for children in future 2020 Census files.   

This paper is meant to provide stakeholders and child advocates with some fundamental information about the level of errors DP will inject into the 2020 Census data for the population ages 0 to 17.  It is intended to help stakeholders gain a better understanding of the implications of DP for 2020 Census data on children and enable stakeholders to use 2020 Census data responsibly.  

There are a couple of reasons for sharing this information with child advocates now. First, 2020 Census results for some localities may include situations where the number of young children reported looks suspect. It is important to make sure child advocates are aware of the potential impact of DP so they can explain odd child statistics to local leaders.

There is a second reason for sharing this information with state and local child advocates. The U.S. Census Bureau is still looking for feedback on the use of DP in the 2020 Census. In particular, they are looking for cases where census data are used to make decisions.  The Census Bureau is asking data users to examine the DP demonstration products to see if the error injected by DP make the data unfit for their use case.   

There is some latitude in how much error the Census Bureau injects into the data for future products, so feedback from census data users is important.  If many users feel the current level of accuracy for data on children is not accurate enough for some uses, there is a chance the Census Bureau could make the data more accurate in future  2020 Census products.

If readers know of situations where census data are used for decision-making, they should notify the Census Bureau.  General information is fine, but information about what specific demographic characteristic(s) are used at what geographic level is even better.   Thoughts and reactions to the use of DP in 2020 Census can be sent to