Denoising the US Census: Succinct Block Hierarchical Regression
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Adam Sealfon

TL;DR
This paper introduces BlueDown, a hierarchical regression method that improves accuracy and efficiency in census data post-processing while maintaining privacy guarantees, addressing computational challenges of large-scale data.
Contribution
We develop a statistically optimal hierarchical regression algorithm with linear-time complexity and incorporate structural constraints, advancing census data privacy and utility.
Findings
BlueDown outperforms TopDown in accuracy at county and tract levels.
The new algorithm reduces computational complexity to linear time.
Structural constraints are effectively integrated into the regression process.
Abstract
The US Census Bureau Disclosure Avoidance System (DAS) balances confidentiality and utility requirements for the decennial US Census (Abowd et al., 2022). The DAS was used in the 2020 Census to produce demographic datasets critically used for legislative apportionment and redistricting, federal and state funding allocation, municipal and infrastructure planning, and scientific research. At the heart of DAS is TopDown, a heuristic post-processing method that combines billions of private noisy measurements across six geographic levels in order to produce new estimates that are consistent, more accurate, and satisfy certain structural constraints on the data. In this work, we introduce BlueDown, a new post-processing method that produces more accurate, consistent estimates while satisfying the same privacy guarantees and structural constraints. We obtain especially large accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Census and Population Estimation · Data Quality and Management
