Nearest Neighbor Classification based on Imbalanced Data: A Statistical Approach
Anvit Garg, Anil K. Ghosh, Soham Sarkar

TL;DR
This paper introduces a statistically grounded nearest neighbor classifier designed for imbalanced datasets, avoiding data augmentation techniques, and demonstrating superior performance through theoretical guarantees and empirical validation.
Contribution
It presents a new nearest neighbor classification method for imbalanced data that does not require data resampling and proves its Bayes risk consistency.
Findings
Outperforms existing methods on benchmark datasets
No pseudo observations or data removal needed
Proven Bayes risk consistency
Abstract
When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop a statistical method for nearest neighbor classification based on such imbalanced data sets. First, we construct a classifier for the binary classification problem and then extend it for classification problems involving more than two classes. Unlike the existing oversampling or undersampling methods, our proposed classifiers do not need to generate any pseudo observations or remove any existing observations, hence the results are exactly reproducible. We establish the Bayes risk consistency of these classifiers under appropriate regularity conditions. Their superior performance over the existing methods is amply demonstrated by analyzing several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Face and Expression Recognition · Text and Document Classification Technologies
