On Reconstructing Training Data From Bayesian Posteriors and Trained Models
George Wynne

TL;DR
This paper introduces a mathematical framework for understanding and performing training data reconstruction attacks on machine learning models, highlighting vulnerabilities and proposing a new score matching approach for Bayesian and non-Bayesian models.
Contribution
It establishes a novel mathematical framework, characterizes vulnerable training data features, and introduces the first score matching method for Bayesian data reconstruction.
Findings
Training data features can be characterized via maximum mean discrepancy.
A new score matching framework enables data reconstruction in Bayesian models.
The approach reveals significant privacy risks in releasing model specifications.
Abstract
Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Timetabling Solutions · Intelligent Tutoring Systems and Adaptive Learning
