Nearest Neighbor Classification based on Imbalanced Data: A Statistical   Approach

Anvit Garg; Anil K. Ghosh; Soham Sarkar

arXiv:2206.10866·stat.ME·November 2, 2023

Nearest Neighbor Classification based on Imbalanced Data: A Statistical Approach

Anvit Garg, Anil K. Ghosh, Soham Sarkar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a statistically grounded nearest neighbor classifier designed for imbalanced datasets, avoiding data augmentation techniques, and demonstrating superior performance through theoretical guarantees and empirical validation.

Contribution

It presents a new nearest neighbor classification method for imbalanced data that does not require data resampling and proves its Bayes risk consistency.

Findings

01

Outperforms existing methods on benchmark datasets

02

No pseudo observations or data removal needed

03

Proven Bayes risk consistency

Abstract

When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop a statistical method for nearest neighbor classification based on such imbalanced data sets. First, we construct a classifier for the binary classification problem and then extend it for classification problems involving more than two classes. Unlike the existing oversampling or undersampling methods, our proposed classifiers do not need to generate any pseudo observations or remove any existing observations, hence the results are exactly reproducible. We establish the Bayes risk consistency of these classifiers under appropriate regularity conditions. Their superior performance over the existing methods is amply demonstrated by analyzing several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anvit25/imb-nn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Face and Expression Recognition · Text and Document Classification Technologies