Analysis of Census Bureau’s April 2021 Differential Privacy Demonstration Product: Implications for Data on Children

Thoughts and reactions to the data based on the DP file issued in April 2021 are due by May 28, 2021.  Comments and responses can be sent to [email protected].  It would help if you put “April 2021 Demonstration Data” in the subject line of the email. 

The U.S. Census Bureau is planning to use a new method called differential privacy (DP) when it releases data from the 2020 Census to help protect confidentiality and privacy of respondents. This paper provides some information on how DP is likely to impact the accuracy of data for children (population ages 0 to 17) in the 2020 Census. The study is based on analysis of the most recent DP demonstration product released by the Census Bureau in April 2021, which applied DP to 2010 Census data. The DP demonstration product issued in April 2021 supersedes four earlier DP demonstration products.

This paper is meant to provide stakeholders and child advocates with some fundamental information about the level of errors DP will inject into the 2020 Census data for the population ages 0 to 17.  It is meant to help stakeholders gain a better understanding of the implications of DP for children, and to enable data users to provide constructive feedback to the Census Bureau on their use of DP.   In June of 2021 Census Bureau leadership will determine the final accuracy parameters the redistricting data (P.L. 94-171) to be released by September 30, 2021. 

According to the Census Bureau, the demonstration file released by the Census Bureau on April 28, 2021 has been optimized for the redistricting application. However, 2020 Census data files that come out from the Census Bureau after the redistricting data are released, for example the Demographic Profiles and the Demographic and Housing Characteristics files, will have more detailed data on children and the data in later files are likely to be made consistent with the total number of 0 to 17-year-olds reported in the redistricting data.  So, errors in the data for 0 to 17-year-olds published in the redistricting data will have implications for child data in 2020 Census files that come out later.  In that sense the analysis of the redistricting data can provide some understanding of the likely accuracy of later 2020 Census data products with data on children. The Census Bureau has indicated it hopes to engage stakeholders in decisions about what data to include and privacy parameters for those subsequent files. 

To its credit, the Census Bureau has quantified its accuracy target for the redistricting data it will release next August/September “…we created an accuracy target to ensure that the largest racial or ethnic group in any geographic entity with a total population of at least 500 people is accurate to within 5 percentage points of their enumerated value at least 95% of the time.”  This leaves open what will happen to geographic units of less than 500 people, and it leaves open how large the errors will be for the 5 percent of the data that are more than 5 percent off.

This paper presents analysis of the error introduced by DP by comparing the data as reported in the 2010 Census Summary File and the same data after DP has been injected as released in the April 2021 Census demonstration file.  Analysis presented in this paper found little impact of DP for large (highly aggregated) geographic units like states or large counties.   However, the story is different for smaller places.    Many smaller areas have high levels of error. For example, the count of children would exhibit absolute percent error of 5 percent or more in about 8 percent of Unified School Districts after DP is applied.  Bigger absolute error percentages are evident for several minority child populations.  Also, the data show that 66 percent of Unified School Districts had absolute numeric errors of 10 or more children.    Errors of this magnitude could have implications for federal and state funding received by schools and for educational planning.  Data also show that 44 percent of places (cities, village, and towns) had absolute percent errors of five percent or more and 56 percent of places had absolute numeric errors of 10 or more children.    

Moreover, after the injection of DP in the 2010 Census data included in the April demonstration product, there are over 91,000 blocks nationwide that had population ages 0 to 17, but no population ages 18 or over.  Blocks with children and no adults is a highly implausible situation and the large number of blocks with children, but no adults may undermine confidence in the overall Census results.   These implausible results are likely due to children being separated from their parents in DP processing. This separation is an ongoing concern for data on children.  

Based on the errors for child population from the level of DP used in the April 2021 DP demonstration product, and the lack of clarity about privacy protection from DP, I recommend the Census Bureau reduce the size of errors injected into the 2020 Census data

 There are a couple of reasons for sharing this information with child advocates now.  First, when the 2020 Census results are published there may be some localities where the number of young children reported looks suspect.  It is important to make sure child advocates are aware of the potential impact of DP so they can explain odd child statistics to local leaders.

There is a second reason for sharing this information with state and local child advocates. As stated earlier, the U.S. Census Bureau is still looking for feedback on the use of DP in the 2020 Census. They are looking for cases where census data are used to make decisions.   The Census Bureau is asking data users to examine the April 2021 DP demonstration product to see if the error injected by DP make the data unfit for use.  After reading this report, we hope you will convey your thoughts to the Census Bureau. There is some latitude in how much error the Census Bureau will inject into the data so feedback from census data users is important.  If many users feel the current level of accuracy for data on children is not accurate enough for some uses, there is a chance the Census Bureau could make the data more accurate.

The demonstration product released on April 28, 2021 is the last demonstration product the Census Bureau will release before they Census Bureau Data Stewardship Executive Policy Committee decides on the DP parameters for the redistricting data (P.L 94-1717 file) that will be released by September 30, 2021 (a version of this file may be made available in August 2021).   

Stakeholders, child advocates, and data users should take advantage of this opportunity to communicate their thoughts to the Census Bureau before a final decision is made.  Let the Census Bureau know how the errors injected by DP are likely to impact your work and effect of lives of children in your state or community.

Found this article helpful? Share it!

More resources like this

What the Supplemental Demographic and Housing Characteristics File from the 2020 Census Tells Us About Future Statistics on Children from the Census Bureau

Dr. Bill O’Hare’s report provides an overview of the implications of the Supplemental Demographic and

No Time for Tweaking

The Census Bureau is already planning for the 2030 Census, but key challenges from 2020

What Past Research Tells Us About How to Prepare for the 2030 U.S. Census Count of Young Children

Probably the most important point in this paper is made in Figure 1 which shows