Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class   Balancing and Low Resource Settings

Nagarjuna Chereddy; Bharath Kumar Bolla

arXiv:2306.13929·cs.LG·June 27, 2023

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

Nagarjuna Chereddy, Bharath Kumar Bolla

PDF

TL;DR

This study evaluates the effectiveness of GAN-generated synthetic tabular data in balancing classes and enhancing classification performance in low-resource scenarios, showing promising results over traditional methods.

Contribution

It introduces the use of GANs for generating synthetic tabular data to improve class balancing and low-resource classification performance, comparing it with SMOTE and ADASYN.

Findings

01

GAN-based data improved recall in class balancing

02

GAN-synthesized data enhanced model performance in low-resource settings

03

GAN outperformed traditional oversampling methods in experiments

Abstract

The present study aimed to address the issue of imbalanced data in classification tasks and evaluated the suitability of SMOTE, ADASYN, and GAN techniques in generating synthetic data to address the class imbalance and improve the performance of classification models in low-resource settings. The study employed the Generalised Linear Model (GLM) algorithm for class balancing experiments and the Random Forest (RF) algorithm for low-resource setting experiments to assess model performance under varying training data. The recall metric was the primary evaluation metric for all classification models. The results of the class balancing experiments showed that the GLM model trained on GAN-balanced data achieved the highest recall value. Similarly, in low-resource experiments, models trained on data enhanced with GAN-synthesized data exhibited better recall values than original data. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGLM · Synthetic Minority Over-sampling Technique.