Model-agnostic Mitigation Strategies of Data Imbalance for Regression

Jelke Wibbeke; Sebastian Rohjans; Andreas Rauh

arXiv:2506.01486·cs.LG·June 3, 2025

Model-agnostic Mitigation Strategies of Data Imbalance for Regression

Jelke Wibbeke, Sebastian Rohjans, Andreas Rauh

PDF

Open Access

TL;DR

This paper reviews and advances data imbalance mitigation in regression, introducing novel methods and relevance functions, and demonstrates their effectiveness through extensive benchmarking on diverse datasets.

Contribution

It proposes new imbalance mitigation techniques (cSMOGN and crbSMOGN), relevance functions for better data importance assessment, and an ensemble approach to improve predictive performance.

Findings

01

crbSMOGN with density-ratio relevance outperforms existing methods

02

Ensemble models reduce negative effects on frequent data points

03

Most strategies improve rare sample prediction but may harm frequent sample accuracy

Abstract

Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability. This is particularly detrimental in applications aimed at predicting rare events that fall outside the domain of the bulk of the training data. In this study, we review the current state-of-the-art regarding sampling-based methods and cost-sensitive learning. Additionally, we propose novel approaches to mitigate model bias. To better asses the importance of data, we introduce the density-distance and density-ratio relevance functions, which effectively integrate empirical frequency of data with domain-specific preferences, offering enhanced interpretability for end-users. Furthermore, we present advanced mitigation techniques (cSMOGN and crbSMOGN), which build upon and improve existing sampling methods. In a comprehensive quantitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems