# Privacy Auditing Synthetic Data Release through Local Likelihood Attacks

**Authors:** Joshua Ward, Chi-Hua Wang, Guang Cheng

arXiv: 2508.21146 · 2026-05-12

## TL;DR

This paper introduces Gen-LRA, a novel no-box membership inference attack that effectively detects privacy leakage in synthetic data by exploiting overfitting, with strong theoretical backing and superior empirical performance.

## Contribution

The paper presents Gen-LRA, a new local likelihood ratio attack that is computationally efficient, theoretically grounded, and outperforms existing methods in privacy auditing of synthetic data.

## Key findings

- Gen-LRA consistently outperforms existing MIAs across diverse datasets and models.
- Theoretical analysis shows Gen-LRA's score correlates with local overfitting, enabling provable detection.
- Empirical results demonstrate Gen-LRA's effectiveness at low false positive rates.

## Abstract

Auditing the privacy leakage of synthetic data is an important but unresolved problem. Existing privacy auditing frameworks for synthetic data rely on heuristics and unrealistic assumptions about model access, offering limited ability to describe or detect the privacy exposure of training data through synthetic data release. In this paper, we study designing membership inference attacks (MIAs) that specifically exploit the observation that tabular generative models tend to significantly overfit to certain regions of the training distribution.   We propose \emph{Generative Likelihood Ratio Attack} (Gen-LRA), a novel, computationally efficient No-Box MIA that, with no assumption of model knowledge or access, formulates its attack by evaluating the influence a test observation has on a surrogate model's estimate of a local likelihood ratio over the synthetic data. We develop a theoretical framework for the attack: we show that the Gen-LRA score admits a closed-form characterization as a localized density-ratio statistic, and we prove that under a general model of local overfitting it produces a provable mean-score gap between members and non-members, yielding testable predictions for when the attack should succeed. We validate these predictions in a controlled simulation study and assess Gen-LRA against a comprehensive benchmark spanning diverse datasets, generative model architectures, and attack parameters. Across metrics, Gen-LRA consistently dominates competing MIAs, with especially strong gains at low false positive rates. These results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data, and highlight the significant privacy risks posed by generative model overfitting in real-world applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21146/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21146/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/2508.21146/full.md

---
Source: https://tomesphere.com/paper/2508.21146