Model-agnostic Mitigation Strategies of Data Imbalance for Regression
Jelke Wibbeke, Sebastian Rohjans, Andreas Rauh

TL;DR
This paper reviews and advances data imbalance mitigation in regression, introducing novel methods and relevance functions, and demonstrates their effectiveness through extensive benchmarking on diverse datasets.
Contribution
It proposes new imbalance mitigation techniques (cSMOGN and crbSMOGN), relevance functions for better data importance assessment, and an ensemble approach to improve predictive performance.
Findings
crbSMOGN with density-ratio relevance outperforms existing methods
Ensemble models reduce negative effects on frequent data points
Most strategies improve rare sample prediction but may harm frequent sample accuracy
Abstract
Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability. This is particularly detrimental in applications aimed at predicting rare events that fall outside the domain of the bulk of the training data. In this study, we review the current state-of-the-art regarding sampling-based methods and cost-sensitive learning. Additionally, we propose novel approaches to mitigate model bias. To better asses the importance of data, we introduce the density-distance and density-ratio relevance functions, which effectively integrate empirical frequency of data with domain-specific preferences, offering enhanced interpretability for end-users. Furthermore, we present advanced mitigation techniques (cSMOGN and crbSMOGN), which build upon and improve existing sampling methods. In a comprehensive quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
