Bayesian Inference of Training Dataset Membership

Yongchao Huang

arXiv:2506.00701·cs.LG·June 3, 2025

Bayesian Inference of Training Dataset Membership

Yongchao Huang

PDF

Open Access

TL;DR

This paper introduces a Bayesian inference method for membership inference that efficiently determines dataset membership in trained models using post-hoc metrics, enhancing privacy analysis without extensive retraining.

Contribution

It presents a novel, interpretable Bayesian approach for membership inference that relies on post-hoc metrics, avoiding the need for shadow models or internal model access.

Findings

01

Effective in synthetic datasets for distinguishing members from non-members

02

Capable of detecting distribution shifts in data

03

Provides a practical, interpretable alternative to existing methods

Abstract

Determining whether a dataset was part of a machine learning model's training data pool can reveal privacy vulnerabilities, a challenge often addressed through membership inference attacks (MIAs). Traditional MIAs typically require access to model internals or rely on computationally intensive shadow models. This paper proposes an efficient, interpretable and principled Bayesian inference method for membership inference. By analyzing post-hoc metrics such as prediction error, confidence (entropy), perturbation magnitude, and dataset statistics from a trained ML model, our approach computes posterior probabilities of membership without requiring extensive model training. Experimental results on synthetic datasets demonstrate the method's effectiveness in distinguishing member from non-member datasets. Beyond membership inference, this method can also detect distribution shifts, offering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management