Choice of neighbor order in nearest-neighbor classification
Peter Hall, Byeong U. Park, Richard J. Samworth

TL;DR
This paper investigates how the choice of neighbor order $k$ affects the misclassification error in nearest-neighbor classification, analyzing Poisson and Binomial models and proposing new methods for selecting $k$.
Contribution
It provides a detailed analysis of the influence of $k$ on error rates and introduces novel techniques for empirically choosing the optimal neighbor order.
Findings
Risk and regret are asymptotically equivalent under both models.
The properties align with kernel-based classifiers for two derivatives.
New methods for selecting $k$ improve classification performance.
Abstract
The th-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of ; and by the absence of techniques for empirical choice of . In the present paper we detail the way in which the value of determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are "assigned" to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
