Handling Imbalanced Datasets Through Optimum-Path Forest

Leandro Aparecido Passos; Danilo S. Jodas; Luiz C. F. Ribeiro; Marco; Akio; Andre Nunes de Souza; Jo\~ao Paulo Papa

arXiv:2202.08934·cs.LG·February 21, 2022

Handling Imbalanced Datasets Through Optimum-Path Forest

Leandro Aparecido Passos, Danilo S. Jodas, Luiz C. F. Ribeiro, Marco, Akio, Andre Nunes de Souza, Jo\~ao Paulo Papa

PDF

1 Repo

TL;DR

This paper introduces three novel Optimum-Path Forest (OPF)-based strategies, including oversampling, undersampling, and a hybrid method, to effectively address class imbalance issues in machine learning datasets, demonstrating superior performance.

Contribution

The paper proposes new OPF-based methods for handling imbalanced datasets, including oversampling, undersampling, and hybrid strategies, with variants that outperform existing techniques.

Findings

01

Proposed methods outperform state-of-the-art techniques on various datasets.

02

The hybrid strategy effectively balances class distribution and improves accuracy.

03

OPF-based approaches show robustness across multiple applications.

Abstract

In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leandropassosjr/opfimb
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.