Sampling Imbalanced Data with Multi-objective Bilevel Optimization

Karen Medlin; Sven Leyffer; Krishnan Raghavan

arXiv:2506.11315·cs.LG·July 11, 2025

Sampling Imbalanced Data with Multi-objective Bilevel Optimization

Karen Medlin, Sven Leyffer, Krishnan Raghavan

PDF

Open Access

TL;DR

This paper introduces MOODS, a multi-objective bilevel optimization framework for sampling imbalanced data, improving minority class classification by enhancing diversity and achieving state-of-the-art F1 score improvements.

Contribution

The paper presents MOODS, a novel multi-objective bilevel optimization method for data sampling, and a new diversification metric to evaluate sampling quality.

Findings

01

Achieves 1-15% increase in F1 scores.

02

Demonstrates state-of-the-art performance in imbalanced classification.

03

Introduces a new metric for assessing sampling diversity.

Abstract

Two-class classification problems are often characterized by an imbalance between the number of majority and minority datapoints resulting in poor classification of the minority class in particular. Traditional approaches, such as reweighting the loss function or na\"ive resampling, risk overfitting and subsequently fail to improve classification because they do not consider the diversity between majority and minority datasets. Such consideration is infeasible because there is no metric that can measure the impact of imbalance on the model. To obviate these challenges, we make two key contributions. First, we introduce MOODS~(Multi-Objective Optimization for Data Sampling), a novel multi-objective bilevel optimization framework that guides both synthetic oversampling and majority undersampling. Second, we introduce a validation metric -- ` $ϵ / δ$ non-overlapping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Algorithms · Text and Document Classification Technologies