Privacy Amplification by Missing Data
Simon Roburin (LPSM (UMR\_8001)), Rafa\"el Pinot (LPSM (UMR\_8001)), Erwan Scornet (LPSM (UMR\_8001))

TL;DR
This paper explores how missing data in datasets can serve as a natural privacy amplification mechanism within differential privacy frameworks, potentially enhancing privacy without additional noise.
Contribution
It formally analyzes missing data as a privacy amplification method in differential privacy, providing the first theoretical validation of this concept.
Findings
Missing data can inherently improve privacy guarantees.
Incomplete datasets can lead to stronger differential privacy protections.
Theoretical analysis confirms privacy amplification effect.
Abstract
Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications often involve datasets with missing values due to non-response, data corruption, or deliberate anonymization. Missing data is traditionally viewed as a limitation because it reduces the information available to analysts and can degrade model performance. In this work, we take an alternative perspective and study missing data from a privacy preservation standpoint. Intuitively, when features are missing, less information is revealed about individuals, suggesting that missingness could inherently enhance privacy. We formalize this intuition by analyzing missing data as a privacy amplification mechanism within the framework of differential privacy. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Cryptography and Data Security
