SynDiffix: More accurate synthetic structured data
Paul Francis, Cristian Berneanu, Edon Gashi

TL;DR
SynDiffix is a novel method for generating highly accurate, anonymous synthetic structured data by leveraging traditional aggregation and noise techniques, outperforming many existing models in accuracy and speed.
Contribution
Introduces SynDiffix, a new approach that uses traditional mechanisms for synthetic data generation, achieving superior accuracy and efficiency over existing GAN-based and autoencoder methods.
Findings
SynDiffix generates data twice as accurate as CTGAN.
It is 10-100 times more accurate for marginal and column pair data.
Execution time is two orders of magnitude faster than comparable systems.
Abstract
This paper introduces SynDiffix, a mechanism for generating statistically accurate, anonymous synthetic data for structured data. Recent open source and commercial systems use Generative Adversarial Networks or Transformed Auto Encoders to synthesize data, and achieve anonymity through overfitting-avoidance. By contrast, SynDiffix exploits traditional mechanisms of aggregation, noise addition, and suppression among others. Compared to CTGAN, ML models generated from SynDiffix are twice as accurate, marginal and column pairs data quality is one to two orders of magnitude more accurate, and execution time is two orders of magnitude faster. Compared to the best commercial product we measured (MostlyAI), ML model accuracy is comparable, marginal and pairs accuracy is 5 to 10 times better, and execution time is an order of magnitude faster. Similar to the other approaches, SynDiffix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Traffic Prediction and Management Techniques
