Generating Synthetic Data with Locally Estimated Distributions for Disclosure Control
Ali Furkan Kalay

TL;DR
This paper presents the Local Resampler (LR), a novel synthetic data generation method that reduces privacy risks from outliers while maintaining data utility, especially for complex distributions, with low computational costs.
Contribution
The paper introduces the Local Resampler (LR), a new approach that mitigates outlier disclosure risks in synthetic data while preserving complex distribution features.
Findings
LR effectively reduces outlier-driven disclosure risks.
LR accurately replicates multimodal, skewed, and non-convex distributions.
LR is computationally efficient with small samples.
Abstract
Sensitive datasets are often underutilized in research and industry due to privacy concerns, limiting the potential of valuable data-driven insights. Synthetic data generation presents a promising solution to address this challenge by balancing privacy protection with data utility. This paper introduces a new approach to mitigate privacy risks associated with outlier observations in synthetic datasets: the Local Resampler (LR). The LR leverages the -nearest neighbors algorithm to generate synthetic data while minimizing disclosure risks by underrepresenting outliers, even when they are not detectable in marginal distributions. Theoretical and empirical analyses demonstrate that the LR effectively mitigates outlier-driven disclosure risks, and accurately replicates multimodal, skewed, and non-convex support distributions. The semiparametric nature of the LR ensures a low computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Machine Learning and Algorithms
