TL;DR
This paper introduces three novel Optimum-Path Forest (OPF)-based strategies, including oversampling, undersampling, and a hybrid method, to effectively address class imbalance issues in machine learning datasets, demonstrating superior performance.
Contribution
The paper proposes new OPF-based methods for handling imbalanced datasets, including oversampling, undersampling, and hybrid strategies, with variants that outperform existing techniques.
Findings
Proposed methods outperform state-of-the-art techniques on various datasets.
The hybrid strategy effectively balances class distribution and improves accuracy.
OPF-based approaches show robustness across multiple applications.
Abstract
In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
