On Reconstructing Training Data From Bayesian Posteriors and Trained Models

George Wynne

arXiv:2507.18372·stat.ML·July 25, 2025

On Reconstructing Training Data From Bayesian Posteriors and Trained Models

George Wynne

PDF

Open Access

TL;DR

This paper introduces a mathematical framework for understanding and performing training data reconstruction attacks on machine learning models, highlighting vulnerabilities and proposing a new score matching approach for Bayesian and non-Bayesian models.

Contribution

It establishes a novel mathematical framework, characterizes vulnerable training data features, and introduces the first score matching method for Bayesian data reconstruction.

Findings

01

Training data features can be characterized via maximum mean discrepancy.

02

A new score matching framework enables data reconstruction in Bayesian models.

03

The approach reveals significant privacy risks in releasing model specifications.

Abstract

Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScheduling and Timetabling Solutions · Intelligent Tutoring Systems and Adaptive Learning