The Adverse Effects of Omitting Records in Differential Privacy: How Sampling and Suppression Degrade the Privacy--Utility Tradeoff (Long Version)
\`Alex Miranda-Pascual, Javier Parra-Arnau, Thorsten Strufe

TL;DR
This paper demonstrates that sampling and suppression, often used in differential privacy, can actually harm the privacy-utility tradeoff, challenging common assumptions about their benefits.
Contribution
The paper provides a theoretical and empirical analysis showing that sampling and suppression do not necessarily improve utility at the same privacy level in differential privacy.
Findings
Sampling degrades utility across various DP mechanisms.
Suppression strategies do not improve the privacy-utility tradeoff.
Uniform sampling is among the least harmful suppression methods.
Abstract
Sampling is renowned for its privacy amplification in differential privacy (DP), and is often assumed to improve the utility of a DP mechanism by allowing a noise reduction. In this paper, we further show that this last assumption is flawed: When measuring utility at equal privacy levels, sampling as preprocessing consistently yields penalties due to utility loss from omitting records over all canonical DP mechanisms -- Laplace, Gaussian, exponential, and report noisy max -- , as well as recent applications of sampling, such as clustering. Extending this analysis, we investigate suppression as a generalized method of choosing, or omitting, records. Developing a theoretical analysis of this technique, we derive privacy bounds for arbitrary suppression strategies under unbounded approximate DP. We find that our tested suppression strategy also fails to improve the privacy--utility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Mobile Crowdsensing and Crowdsourcing
