Synthesizing real-world distributions from high-dimensional Gaussian Noise with Fully Connected Neural Network
Joanna Komorniczak

TL;DR
This paper introduces a fast, fully connected neural network-based method for generating synthetic data from Gaussian noise that outperforms existing techniques in speed and quality across multiple datasets.
Contribution
It presents a novel, efficient approach using a randomized loss function to transform Gaussian noise into realistic synthetic data, surpassing current state-of-the-art methods.
Findings
Achieves significantly faster MMD scores compared to deep learning solutions.
Outperforms existing generative methods on 25 diverse datasets.
Enhances data privacy and classification performance through PCA-based dimensionality reduction.
Abstract
The use of synthetic data in machine learning applications and research offers many benefits, including performance improvements through data augmentation, privacy preservation of original samples, and reliable method assessment with fully synthetic data. This work proposes a time-efficient synthetic data generation method based on a fully connected neural network and a randomized loss function that transforms a random Gaussian distribution to approximate a target real-world dataset. The experiments conducted on 25 diverse tabular real-world datasets confirm that the proposed solution surpasses the state-of-the-art generative methods and achieves reference MMD scores orders of magnitude faster than modern deep learning solutions. The experiments involved analyzing distributional similarity, assessing the impact on classification quality, and using PCA for dimensionality reduction, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
