Synthesizing Tabular Data Using Selectivity Enhanced Generative Adversarial Networks
Youran Zhou, Jianzhong Qi

TL;DR
This paper presents a new GAN-based method for synthesizing tabular data that incorporates selectivity constraints, improving accuracy and utility for E-commerce stress testing while addressing computational efficiency.
Contribution
It introduces a novel GAN approach with query selectivity constraints and a pre-trained neural network to enhance synthetic data quality for stress testing.
Findings
Outperforms three state-of-the-art GANs and a VAE model.
Improves selectivity estimation accuracy by up to 20%.
Enhances machine learning utility by up to 6%.
Abstract
As E-commerce platforms face surging transactions during major shopping events like Black Friday, stress testing with synthesized data is crucial for resource planning. Most recent studies use Generative Adversarial Networks (GANs) to generate tabular data while ensuring privacy and machine learning utility. However, these methods overlook the computational demands of processing GAN-generated data, making them unsuitable for E-commerce stress testing. This thesis introduces a novel GAN-based approach incorporating query selectivity constraints, a key factor in database transaction processing. We integrate a pre-trained deep neural network to maintain selectivity consistency between real and synthetic data. Our method, tested on five real-world datasets, outperforms three state-of-the-art GANs and a VAE model, improving selectivity estimation accuracy by up to 20pct and machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
