Kernel density estimation based sampling for imbalanced class distribution
Firuz Kamalov

TL;DR
This paper proposes a kernel density estimation-based sampling method to address class imbalance in data, demonstrating improved performance over traditional techniques across various datasets and classifiers.
Contribution
It introduces a KDE-based sampling approach for minority class oversampling, which is less prone to overfitting and outperforms existing methods in imbalanced classification tasks.
Findings
KDE sampling outperforms standard methods in F1-score and G-mean.
The method is effective across multiple datasets and classifiers.
Results are consistent regardless of class distribution ratio.
Abstract
Imbalanced response variable distribution is a common occurrence in data science. In fields such as fraud detection, medical diagnostics, system intrusion detection and many others where abnormal behavior is rarely observed the data under study often features disproportionate target class distribution. One common way to combat class imbalance is through resampling the minority class to achieve a more balanced distribution. In this paper, we investigate the performance of the sampling method based on kernel density estimation (KDE). We believe that KDE offers a more natural way of generating new instances of minority class that is less prone to overfitting than other standard sampling techniques. It is based on a well established theory of nonparametric statistical estimation. Numerical experiments show that KDE can outperform other sampling techniques on a range of real life datasets as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
