Permutation-Invariant Tabular Data Synthesis
Yujin Zhu, Zilong Zhao, Robert Birke, Lydia Y. Chen

TL;DR
This paper investigates the permutation sensitivity of AI-based tabular data synthesizers and proposes solutions to improve permutation invariance, resulting in higher quality and utility of synthetic data for privacy-preserving data sharing.
Contribution
It introduces AE-GAN and a feature sorting algorithm to address permutation sensitivity in tabular data synthesis, enhancing data quality and utility.
Findings
Permutation sensitivity can reduce data quality by up to 38.67%.
Proposed solutions improve permutation invariance and data utility by up to 22%.
Enhanced synthetic data quality benefits downstream data analysis tasks.
Abstract
Tabular data synthesis is an emerging approach to circumvent strict regulations on data privacy while discovering knowledge through big data. Although state-of-the-art AI-based tabular data synthesizers, e.g., table-GAN, CTGAN, TVAE, and CTAB-GAN, are effective at generating synthetic tabular data, their training is sensitive to column permutations of input data. In this paper, we first conduct an extensive empirical study to disclose such a property of permutation invariance and an in-depth analysis of the existing synthesizers. We show that changing the input column order worsens the statistical difference between real and synthetic data by up to 38.67% due to the encoding of tabular data and the network architectures. To fully unleash the potential of big synthetic tabular data, we propose two solutions: (i) AE-GAN, a synthesizer that uses an autoencoder network to represent the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsConvolution · HuMan(Expedia)||How do I get a human at Expedia? · Dense Connections · Batch Normalization · Dropout · *Communicated@Fast*How Do I Communicate to Expedia? · Feedforward Network · Deep Convolutional GAN · CTAB-GAN
