Joint Selection: Adaptively Incorporating Public Information for Private Synthetic Data
Miguel Fuentes, Brett Mullins, Ryan McKenna, Gerome Miklau, Daniel, Sheldon

TL;DR
This paper introduces jam-pgm, a novel mechanism that adaptively combines public and private data in graphical models to improve synthetic data quality, even with biased public data.
Contribution
The paper develops jam-pgm, a new adaptive measurement framework that jointly selects public and private data for differentially private synthetic data generation.
Findings
Outperforms existing methods in synthetic data quality.
Effectively incorporates biased public data.
Enhances graphical-model-based mechanisms.
Abstract
Mechanisms for generating differentially private synthetic data based on marginals and graphical models have been successful in a wide range of settings. However, one limitation of these methods is their inability to incorporate public data. Initializing a data generating model by pre-training on public data has shown to improve the quality of synthetic data, but this technique is not applicable when model structure is not determined a priori. We develop the mechanism jam-pgm, which expands the adaptive measurements framework to jointly select between measuring public data and private data. This technique allows for public data to be included in a graphical-model-based mechanism. We show that jam-pgm is able to outperform both publicly assisted and non publicly assisted synthetic data generation mechanisms even when the public data distribution is biased.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Cryptography and Data Security
MethodsNetwork On Network
