Combining Public and Private Data
Cecilia Ferrando, Jennifer Gillenwater, Alex Kulesza

TL;DR
This paper introduces optimized mixed estimators for combining public and private data under differential privacy, demonstrating improved accuracy over existing methods through empirical evaluation.
Contribution
It presents novel mixed estimators for mean and median that effectively combine heterogeneous privacy data, outperforming previous approaches.
Findings
Mixed estimators reduce variance compared to baseline methods.
Experiments show improved accuracy in aggregate statistics estimation.
Proposed mechanisms outperform existing techniques in empirical tests.
Abstract
Differential privacy is widely adopted to provide provable privacy guarantees in data analysis. We consider the problem of combining public and private data (and, more generally, data with heterogeneous privacy needs) for estimating aggregate statistics. We introduce a mixed estimator of the mean optimized to minimize the variance. We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users. Similarly, we present a mixed median estimator based on the exponential mechanism. We compare our mechanisms to the methods proposed in Jorgensen et al. [2015]. Our experiments provide empirical evidence that our mechanisms often outperform the baseline methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Privacy, Security, and Data Protection
