Post-processing Private Synthetic Data for Improving Utility on Selected   Measures

Hao Wang; Shivchander Sudalairaj; John Henning; Kristjan Greenewald,; Akash Srivastava

arXiv:2305.15538·cs.LG·October 20, 2023·1 cites

Post-processing Private Synthetic Data for Improving Utility on Selected Measures

Hao Wang, Shivchander Sudalairaj, John Henning, Kristjan Greenewald,, Akash Srivastava

PDF

Open Access 1 Video

TL;DR

This paper presents a post-processing method that enhances the utility of private synthetic data for specific measures by resampling, while maintaining privacy guarantees, demonstrated through extensive experiments.

Contribution

It introduces a novel post-processing resampling technique that improves synthetic data utility for targeted measures without compromising privacy.

Findings

01

Consistently improves utility across multiple datasets

02

Effective with various synthetic data generation algorithms

03

Maintains strong privacy guarantees

Abstract

Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end users may have specific requirements that the synthetic data must satisfy. Failure to meet these requirements could significantly reduce the utility of the data for downstream use. We introduce a post-processing technique that improves the utility of the synthetic data with respect to measures selected by the end user, while preserving strong privacy guarantees and dataset quality. Our technique involves resampling from the synthetic data to filter out samples that do not meet the selected utility measures, using an efficient stochastic first-order algorithm to find optimal resampling weights. Through comprehensive numerical experiments, we demonstrate that our approach consistently improves the utility of synthetic data across multiple benchmark datasets and state-of-the-art synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Post-processing Private Synthetic Data for Improving Utility on Selected Measures· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Data Storage Technologies · Traffic Prediction and Management Techniques