Handling Extreme Class Imbalance: Using GANs in Data Augmentation for Suicide Prediction
Vaishnavi Visweswaraiah, Tanvi Banerjee, William Romine

TL;DR
This paper addresses the challenge of extreme class imbalance in suicide prediction by using GANs to generate synthetic data, significantly improving machine learning model performance in identifying positive cases.
Contribution
It introduces a novel application of GANs for data augmentation in suicide prediction, enhancing model accuracy despite limited positive samples.
Findings
GAN-augmented data improved model sensitivity
Random Forest achieved highest overall performance
Synthetic data helped address class imbalance
Abstract
Suicide prediction is the key for prevention, but real data with sufficient positive samples is rare and causes extreme class imbalance. We utilized machine learning (ML) to build the model and deep learning (DL) techniques, like Generative Adversarial Networks (GAN), to generate synthetic data samples to enhance the dataset. The initial dataset contained 656 samples, with only four positive cases, prompting the need for data augmentation. A variety of machine learning models, ranging from interpretable data models to black box algorithmic models, were used. On real test data, Logistic Regression (LR) achieved a weighted precision of 0.99, a weighted recall of 0.85, and a weighted F1 score of 0.91; Random Forest (RF) showed 0.98, 0.99, and 0.99, respectively; and Support Vector Machine (SVM) achieved 0.99, 0.76, and 0.86. LR and SVM correctly identified one suicide attempt case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Suicide and Self-Harm Studies · Mental Health Treatment and Access
