Kernel density estimation-based sampling for neural network   classification

Firuz Kamalov; Ashraf Elnagar

arXiv:2110.12644·cs.LG·October 26, 2021

Kernel density estimation-based sampling for neural network classification

Firuz Kamalov, Ashraf Elnagar

PDF

TL;DR

This paper evaluates a kernel density estimation-based sampling method for neural network classification on imbalanced datasets, showing it often outperforms traditional methods but with limitations on image data.

Contribution

It introduces and benchmarks KDE sampling for neural networks, demonstrating its effectiveness in handling imbalanced data compared to existing techniques.

Findings

01

KDE sampling outperforms baseline methods on 6 of 8 datasets.

02

It improves neural network performance significantly.

03

Caution is advised when applying KDE sampling to image datasets.

Abstract

Imbalanced data occurs in a wide range of scenarios. The skewed distribution of the target variable elicits bias in machine learning algorithms. One of the popular methods to combat imbalanced data is to artificially balance the data through resampling. In this paper, we compare the efficacy of a recently proposed kernel density estimation (KDE) sampling technique in the context of artificial neural networks. We benchmark the KDE sampling method against two base sampling techniques and perform comparative experiments using 8 datasets and 3 neural networks architectures. The results show that KDE sampling produces the best performance on 6 out of 8 datasets. However, it must be used with caution on image datasets. We conclude that KDE sampling is capable of significantly improving the performance of neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.