Tipping the Balance: Impact of Class Imbalance Correction on the Performance of Clinical Risk Prediction Models

Amalie Koch Andersen; Hadi Mehdizavareh; Arijit Khan; Tobias Becher; Simone Britsch; Markward Britsch; Morten B{\o}ttcher; Simon Winther; Palle Duun Rohde; Morten Hasselstr{\o}m Jensen; Simon Lebech Cichosz

arXiv:2603.00208·q-bio.QM·March 3, 2026

Tipping the Balance: Impact of Class Imbalance Correction on the Performance of Clinical Risk Prediction Models

Amalie Koch Andersen, Hadi Mehdizavareh, Arijit Khan, Tobias Becher, Simone Britsch, Markward Britsch, Morten B{\o}ttcher, Simon Winther, Palle Duun Rohde, Morten Hasselstr{\o}m Jensen, Simon Lebech Cichosz

PDF

Open Access

TL;DR

This study systematically evaluated the impact of class imbalance correction techniques on clinical risk prediction models, finding that these methods do not improve discrimination and often worsen calibration across diverse real-world datasets.

Contribution

It provides comprehensive evidence that common resampling strategies do not enhance model discrimination and impair calibration in clinical risk prediction tasks.

Findings

01

Resampling strategies showed no consistent improvement in ROC-AUC.

02

Imbalance correction degraded calibration and increased Brier scores.

03

Models with imbalance correction exhibited systematic risk prediction distortions.

Abstract

Objective: ML-based clinical risk prediction models are increasingly used to support decision-making in healthcare. While class-imbalance correction techniques are commonly applied to improve model performance in settings with rare outcomes, their impact on probabilistic calibration remains insufficiently understood. This study evaluated the effect of widely used resampling strategies on both discrimination and calibration across real-world clinical prediction tasks. Methods: Ten clinical datasets spanning diverse medical domains and including 605,842 patients were analyzed. Multiple machine-learning model families, including linear models and several non-linear approaches, were evaluated. Models were trained on the original data and under three commonly used 1:1 class-imbalance correction strategies (SMOTE, RUS, ROS). Performance was assessed on held-out data using discrimination and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Imbalanced Data Classification Techniques · Artificial Intelligence in Healthcare and Education