Implications of Differential Privacy for Reported Data on Children in the 2020 U.S. Census

Implications of Differential Privacy for Reported Data on Children in the 2020 U.S. Census
by
William P. O’Hare
President, O’Hare Data and Demographic Services LLC
Consultant to the Count All Kids 2020 Census Complete Count Committee

The U.S. Census Bureau is planning to use a new method called differential privacy (DP) to help protect confidentiality and privacy of respondents in the 2020 Census. This paper provides some information on how DP is likely to impact the accuracy of data for young children (ages 0 to 4) from the 2020 Census.  The analysis also examines other age groups of children in the context of school districts.

The U.S. Census Bureau is still refining its effort to implement DP, but analysis of the most recent demonstration data available for young children shows that for several kinds of geographic units (counties, State legislative districts, school districts, places, and census tracts) the distortions injected by DP to help protect privacy, foster large errors for the population ages 0 to 4. For example, the Census Bureau’s May 2020 demonstration file showed that the 2010 Census count of children ages 0 to 4 would exhibit errors of 10 percent or more in about two-thirds (64 percent) of all census tracts after the application of DP.  And more than a quarter of the tracts (28 percent) had errors of 25 percent or more for children age 0 to 4.[1]

Data for school districts were also examined.  For smaller populations (i.e., age 4 or ages 0 to 4) there were substantial errors for school districts.  For example, DP methods introduced errors of 10 percent or more for counts of children age 4 in 68 percent of school districts. DP introduced errors of 10 percent or more for counts of children ages 0 to 4 in 44 percent of school districts. For the population ages 5 to 17 and for ages 0 to 17 the error rates are lower.

Smaller geographic areas in terms of population size tend to have higher levels of error injected by DP.  This is important because the census is designed to produce data for a lot of small geographic units  These errors are likely to cause problems in many use cases such as the amount of state and federal funds received by school districts.  For a small school district to get 10 percent less money than it deserves will cause serious problems.    It will be difficult for child advocates to support the use of DP in the 2020 Census if it produces errors like those identified in this paper. 

The final decision about the use of DP in the 2020 Census is likely to be made in December 2020 or January 2021, and the U.S. Census Bureau is still looking for feedback from data users.  Comments can be sent to [email protected].

[1] In this analysis, errors are the difference between DP-infused 2010 Census data and the 2010 Census data without DP.

Found this article helpful? Share it!

More resources like this

What the Supplemental Demographic and Housing Characteristics File from the 2020 Census Tells Us About Future Statistics on Children from the Census Bureau

Dr. Bill O’Hare’s report provides an overview of the implications of the Supplemental Demographic and

No Time for Tweaking

The Census Bureau is already planning for the 2030 Census, but key challenges from 2020

What Past Research Tells Us About How to Prepare for the 2030 U.S. Census Count of Young Children

Probably the most important point in this paper is made in Figure 1 which shows