In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
Jiaqi Wang, Roei Schuster, Ilia Shumailov, David Lie, Nicolas Papernot

TL;DR
This paper reveals that noise added in differentially private ensemble learning methods like PATE can inadvertently enable sensitive information leakage through vote histograms, challenging assumptions about privacy guarantees.
Contribution
It uncovers a new form of privacy leakage in differentially private ensemble models by exploiting stochastic vote distributions, highlighting limitations of current privacy guarantees.
Findings
Adversaries can extract sensitive attributes from vote histograms.
Adding more noise can paradoxically make the attack easier.
The attack does not violate differential privacy but breaches privacy norms.
Abstract
When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Internet Traffic Analysis and Secure E-voting
