The Census Bureau hasn’t provided deadlines or details for data products demonstrating its new method for protecting the privacy of 2020 census respondents, according to a Government Accountability Office report released Monday.
Differential privacy — systems that withhold information on people in datasets while publicly sharing data on group patterns — will be used with forthcoming census products like the demographic and housing characteristics file.
The bureau already employed differential privacy to mitigate the risk of census respondents being re-identified when it released redistricting data, used to redraw legislative boundaries every decade, in August. But GAO found there’s no way of knowing if that’s currently “realistic and achievable” with forthcoming data products.
“The success of a program depends in part on having a reliable schedule that defines when work will occur,” reads GAO’s report. “Without a specific and complete schedule, the bureau may be unable to accurately plan for and track progress on disclosure avoidance steps for future data products.”
The Decennial Directorate cited the fact its schedule is being updated in phases for not yet setting deadlines for additional disclosure avoidance activities, but it expected to make “key decisions in the winter and spring.
GAO recommended the bureau update its schedule of activities with specific timeframes because of their potential to impact “key features” of the 2030 census being decided over the next three years, and the agency agreed.
“The Census Bureau will prepare a formal action plan addressing this recommendation upon GAO’s issuance of the final report,” wrote the Department of Commerce, within which the bureau resides, in its response.
Previously the bureau mitigated indirect disclosure of personally identifiable information through data suppression, swapping an rounding, but advances in technology saw it identify a vulnerability in published 2010 census data in 2018. The bureau reconstructed the sex, age, race and ethnicity information of some people using that data, so it turned to differential privacy in 2020.
Since then the Data Stewardship Executive Policy Committee has held several meetings to discuss user outreach and make decisions around differential privacy, and the bureau has published several demonstration data products.
The bureau continues to assess 2020 census data quality with tools like the independent Post-Enumeration Survey (PES), a sampling of the population used to estimate the number of people and houses missed or counted more than once, as well as undercounts and overcounts of the population by demographic — with national estimates released March 10 and state estimates expected June 30, 2022.
Tool releases are delayed because the COVID-19 pandemic delayed census operations beginning with field data collection, but GAO raised concerns about 2020 census planning — including the development of new IT systems — back in 2017 when it was placed on the High-Risk List.
“[C]ontinued attention and oversight is warranted, as multiple data products have yet to be produced and key activities related to data privacy and quality remain to be completed,” GAO’s report reads.