Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings
Nagarjuna Chereddy, Bharath Kumar Bolla

TL;DR
This study evaluates the effectiveness of GAN-generated synthetic tabular data in balancing classes and enhancing classification performance in low-resource scenarios, showing promising results over traditional methods.
Contribution
It introduces the use of GANs for generating synthetic tabular data to improve class balancing and low-resource classification performance, comparing it with SMOTE and ADASYN.
Findings
GAN-based data improved recall in class balancing
GAN-synthesized data enhanced model performance in low-resource settings
GAN outperformed traditional oversampling methods in experiments
Abstract
The present study aimed to address the issue of imbalanced data in classification tasks and evaluated the suitability of SMOTE, ADASYN, and GAN techniques in generating synthetic data to address the class imbalance and improve the performance of classification models in low-resource settings. The study employed the Generalised Linear Model (GLM) algorithm for class balancing experiments and the Random Forest (RF) algorithm for low-resource setting experiments to assess model performance under varying training data. The recall metric was the primary evaluation metric for all classification models. The results of the class balancing experiments showed that the GLM model trained on GAN-balanced data achieved the highest recall value. Similarly, in low-resource experiments, models trained on data enhanced with GAN-synthesized data exhibited better recall values than original data. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGLM · Synthetic Minority Over-sampling Technique.
