Sampling Imbalanced Data with Multi-objective Bilevel Optimization
Karen Medlin, Sven Leyffer, Krishnan Raghavan

TL;DR
This paper introduces MOODS, a multi-objective bilevel optimization framework for sampling imbalanced data, improving minority class classification by enhancing diversity and achieving state-of-the-art F1 score improvements.
Contribution
The paper presents MOODS, a novel multi-objective bilevel optimization method for data sampling, and a new diversification metric to evaluate sampling quality.
Findings
Achieves 1-15% increase in F1 scores.
Demonstrates state-of-the-art performance in imbalanced classification.
Introduces a new metric for assessing sampling diversity.
Abstract
Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints resulting in poor classification of the minority class in particular. Traditional approaches, such as reweighting the loss function or na\"ive resampling, risk overfitting and subsequently fail to improve classification because they do not consider the diversity between majority and minority datasets. Such consideration is infeasible because there is no metric that can measure the impact of imbalance on the model. To obviate these challenges, we make two key contributions. First, we introduce MOODS~(Multi-Objective Optimization for Data Sampling), a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling. Second, we introduce a validation metric -- ` non-overlapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Algorithms · Text and Document Classification Technologies
