A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction
Nguyen Thach, Patrick Habecker, Bergen Johnston, Lillianna Cervantes,, Anika Eisenbraun, Alex Mason, Kimberly Tyler, Bilal Khan, Hau Chan

TL;DR
This paper introduces a novel GAN-based method to augment limited tabular survey data of PWUDs, significantly improving short-term substance use prediction accuracy for various drugs.
Contribution
The paper presents a new GAN approach tailored for high-dimensional, low-sample-size tabular data with survey skip logic, enhancing predictive models for substance use behaviors.
Findings
Augmented data improved AUROC by up to 13.4% for usage increase prediction.
Augmented data improved AUROC by up to 15.8% for usage frequency prediction.
Outperformed existing state-of-the-art generative models.
Abstract
Substance use is a global issue that negatively impacts millions of persons who use drugs (PWUDs). In practice, identifying vulnerable PWUDs for efficient allocation of appropriate resources is challenging due to their complex use patterns (e.g., their tendency to change usage within months) and the high acquisition costs for collecting PWUD-focused substance use data. Thus, there has been a paucity of machine learning models for accurately predicting short-term substance use behaviors of PWUDs. In this paper, using longitudinal survey data of 258 PWUDs in the U.S. Great Plains collected by our team, we design a novel GAN that deals with high-dimensional low-sample-size tabular data and survey skip logic to augment existing data to improve classification models' prediction on (A) whether the PWUDs would increase usage and (B) at which ordinal frequency they would use a particular drug…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Processing Techniques · Air Quality Monitoring and Forecasting
