Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIV
Nicholas I-Hsien Kuo, Federico Garcia, Anders S\"onnerborg, Maurizio, Zazzi, Michael B\"ohm, Rolf Kaiser, Mark Polizzotto, Louisa Jorm and, Sebastiano Barbieri

TL;DR
This paper introduces an enhanced GAN framework with VAE and external memory to generate diverse, realistic synthetic clinical data that accurately reflects imbalanced class distributions, exemplified by HIV treatment data.
Contribution
The authors develop an extended GAN model incorporating VAE and external memory, effectively addressing mode collapse and producing high-quality synthetic healthcare data.
Findings
Overcomes mode collapse in GANs for clinical data
Generates synthetic data with accurate class imbalance representation
Maintains low patient disclosure risk and high utility
Abstract
Clinical data usually cannot be freely distributed due to their highly confidential nature and this hampers the development of machine learning in the healthcare domain. One way to mitigate this problem is by generating realistic synthetic datasets using generative adversarial networks (GANs). However, GANs are known to suffer from mode collapse thus creating outputs of low diversity. This lowers the quality of the synthetic healthcare data, and may cause it to omit patients of minority demographics or neglect less common clinical practices. In this paper, we extend the classic GAN setup with an additional variational autoencoder (VAE) and include an external memory to replay latent features observed from the real samples to the GAN generator. Using antiretroviral therapy for human immunodeficiency virus (ART for HIV) as a case study, we show that our extended setup overcomes mode…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · AI in cancer detection
