Differentially Private Synthetic High-dimensional Tabular Stream
Girish Kumar, Thomas Strohmer, Roman Vershynin

TL;DR
This paper introduces a differentially private streaming algorithm for generating and updating synthetic high-dimensional tabular data over time, ensuring privacy and utility in dynamic data environments.
Contribution
It presents a novel framework for continual differential privacy in streaming high-dimensional data, extending synthetic data generation to dynamic, real-world scenarios.
Findings
Effective privacy guarantees over data streams
High utility demonstrated on real-world datasets
Applicable to high-dimensional tabular data
Abstract
While differentially private synthetic data generation has been explored extensively in the literature, how to update this data in the future if the underlying private data changes is much less understood. We propose an algorithmic framework for streaming data that generates multiple synthetic datasets over time, tracking changes in the underlying private data. Our algorithm satisfies differential privacy for the entire input stream (continual differential privacy) and can be used for high-dimensional tabular data. Furthermore, we show the utility of our method via experiments on real-world datasets. The proposed algorithm builds upon a popular select, measure, fit, and iterate paradigm (used by offline synthetic data generation algorithms) and private counters for streams.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurface Modification and Superhydrophobicity · Fluid Dynamics and Thin Films · Fluid Dynamics and Heat Transfer
