Differentially Private Synthetic Heavy-tailed Data
Tran Tran, Matthew Reimherr, Aleksandra Slavkovi\'c

TL;DR
This paper introduces a differential privacy framework using the K-Norm Gradient Mechanism with quantile regression to generate synthetic heavy-tailed data, ensuring privacy and high utility for economic research datasets.
Contribution
It proposes a novel DP synthetic data generation method tailored for heavy-tailed data using KNG and quantile regression, with an efficient stepwise implementation.
Findings
Achieves better utility than existing KNG methods at the same privacy level.
Successfully applied to the Synthetic Longitudinal Business Database.
Demonstrates strong privacy guarantees with high data utility.
Abstract
The U.S. Census Longitudinal Business Database (LBD) product contains employment and payroll information of all U.S. establishments and firms dating back to 1976 and is an invaluable resource for economic research. However, the sensitive information in LBD requires confidentiality measures that the U.S. Census in part addressed by releasing a synthetic version (SynLBD) of the data to protect firms' privacy while ensuring its usability for research activities, but without provable privacy guarantees. In this paper, we propose using the framework of differential privacy (DP) that offers strong provable privacy protection against arbitrary adversaries to generate synthetic heavy-tailed data with a formal privacy guarantee while preserving high levels of utility. We propose using the K-Norm Gradient Mechanism (KNG) with quantile regression for DP synthetic data generation. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Causal Inference Techniques · Statistical Methods and Bayesian Inference
