Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy
Lucas Rosenblatt, Bernease Herman, Anastasia Holovenko, Wonkwon Lee,, Joshua Loftus, Elizabeth McKinnie, Taras Rumezhak, Andrii Stadnik, Bill Howe,, Julia Stoyanovich

TL;DR
This paper proposes a new evaluation methodology for DP synthetic data based on epistemic parity, assessing whether published conclusions remain consistent when using synthetic data instead of real data.
Contribution
It introduces an epistemic parity-based evaluation framework that reproduces and compares empirical conclusions on real and synthetic data, providing a more practical utility assessment for DP mechanisms.
Findings
State-of-the-art DP synthesizers achieve high epistemic parity for several papers.
Some findings remain difficult to reproduce across all synthesizers.
The methodology automates the reproduction of empirical claims to evaluate utility.
Abstract
Differential privacy (DP) data synthesizers support public release of sensitive information, offering theoretical guarantees for privacy but limited evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, accuracy of trained classifiers, or performance over a query workload. The ability for these results to generalize to practitioners' experience has been questioned in a number of settings, including the U.S. Census. In this paper, we propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks, instead measuring the likelihood that published conclusions would change had the authors used synthetic data, a condition we call epistemic parity. Our methodology consists of reproducing empirical conclusions of peer-reviewed papers on real,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Advanced Causal Inference Techniques
