The Effect of Data Swapping on Analyses of American Community Survey Data
Nicolas Kim

TL;DR
This paper investigates how data swapping, a privacy-preserving technique used in census data, impacts the accuracy of contingency table analyses, revealing that targeted swapping can distort joint distributions of categorical variables.
Contribution
It provides an empirical analysis of data swapping effects on American Community Survey data, highlighting how targeted swapping affects data utility.
Findings
Data swapping alters joint distributions of categorical variables.
Targeted swapping can significantly distort analysis results.
Effects depend on the swapping rate and targeting criteria.
Abstract
Researchers from a growing range of fields and industries rely on public-access census data. These data are altered by census-taking agencies to minimize the risk of identification; one such disclosure avoidance measure is the data swapping procedure. I study the effects of data swapping on contingency tables using a dummy dataset, public-use American Community Survey (ACS) data, and restricted-use ACS data accessed within the U.S.\ Census Bureau. These simulations demonstrate that as the rate of swapping is varied, the effect on joint distributions of categorical variables is no longer understandable when the data swapping procedure attempts to target at-risk individuals for swapping using a simple targeting criterion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealthcare Policy and Management · Census and Population Estimation · Survey Methodology and Nonresponse
