Tipping the Balance: Impact of Class Imbalance Correction on the Performance of Clinical Risk Prediction Models
Amalie Koch Andersen, Hadi Mehdizavareh, Arijit Khan, Tobias Becher, Simone Britsch, Markward Britsch, Morten B{\o}ttcher, Simon Winther, Palle Duun Rohde, Morten Hasselstr{\o}m Jensen, Simon Lebech Cichosz

TL;DR
This study systematically evaluated the impact of class imbalance correction techniques on clinical risk prediction models, finding that these methods do not improve discrimination and often worsen calibration across diverse real-world datasets.
Contribution
It provides comprehensive evidence that common resampling strategies do not enhance model discrimination and impair calibration in clinical risk prediction tasks.
Findings
Resampling strategies showed no consistent improvement in ROC-AUC.
Imbalance correction degraded calibration and increased Brier scores.
Models with imbalance correction exhibited systematic risk prediction distortions.
Abstract
Objective: ML-based clinical risk prediction models are increasingly used to support decision-making in healthcare. While class-imbalance correction techniques are commonly applied to improve model performance in settings with rare outcomes, their impact on probabilistic calibration remains insufficiently understood. This study evaluated the effect of widely used resampling strategies on both discrimination and calibration across real-world clinical prediction tasks. Methods: Ten clinical datasets spanning diverse medical domains and including 605,842 patients were analyzed. Multiple machine-learning model families, including linear models and several non-linear approaches, were evaluated. Models were trained on the original data and under three commonly used 1:1 class-imbalance correction strategies (SMOTE, RUS, ROS). Performance was assessed on held-out data using discrimination and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Imbalanced Data Classification Techniques · Artificial Intelligence in Healthcare and Education
