MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation
Yuan Zhong, Suhan Cui, Jiaqi Wang, Xiaochen Wang, Ziyi Yin, Yaqing, Wang, Houping Xiao, Mengdi Huai, Ting Wang, Fenglong Ma

TL;DR
MedDiffusion is a diffusion-based model that improves health risk prediction from EHR data by generating synthetic data and capturing hidden patient visit relationships, outperforming existing methods.
Contribution
This paper introduces MedDiffusion, a novel end-to-end diffusion model that enhances risk prediction by synthetic data augmentation and attention mechanisms, addressing data insufficiency in medical datasets.
Findings
Outperforms 14 baseline models in PR-AUC, F1, and Cohen's Kappa
Demonstrates effectiveness across four real-world datasets
Provides insights into data interpretability and model robustness
Abstract
Health risk prediction is one of the fundamental tasks under predictive modeling in the medical domain, which aims to forecast the potential health risks that patients may face in the future using their historical Electronic Health Records (EHR). Researchers have developed several risk prediction models to handle the unique challenges of EHR data, such as its sequential nature, high dimensionality, and inherent noise. These models have yielded impressive results. Nonetheless, a key issue undermining their effectiveness is data insufficiency. A variety of data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through the learning of underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling
