Generative Oversampling for Imbalanced Data via Majority-Guided VAE
Qingzhong Ai, Pengyun Wang, Lirong He, Liangjian Wen, Lujia Pan,, Zenglin Xu

TL;DR
This paper introduces MGVAE, a novel generative oversampling method guided by majority class data, which improves minority class sample diversity and reduces overfitting in imbalanced datasets.
Contribution
The paper proposes MGVAE, a majority-guided variational autoencoder that leverages inter-class relationships for better minority data augmentation, with a two-stage training process including pre-training and fine-tuning.
Findings
MGVAE outperforms existing oversampling methods on benchmark datasets.
The method effectively mitigates overfitting in imbalanced classification.
Experimental results show significant improvements in downstream classification accuracy.
Abstract
Learning with imbalanced data is a challenging problem in deep learning. Over-sampling is a widely used technique to re-balance the sampling distribution of training data. However, most existing over-sampling methods only use intra-class information of minority classes to augment the data but ignore the inter-class relationships with the majority ones, which is prone to overfitting, especially when the imbalance ratio is large. To address this issue, we propose a novel over-sampling model, called Majority-Guided VAE~(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks. Furthermore, to prevent model collapse under limited data, we first pre-train MGVAE on sufficient majority samples and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · COVID-19 diagnosis using AI · Digital Imaging for Blood Diseases
