Optimal choice of $k$ for $k$-nearest neighbor regression
Mona Azadkia

TL;DR
This paper proves that selecting the number of neighbors $k$ in $k$-NN regression via leave-one-out cross-validation yields an estimator with mean squared error close to the best possible, confirming its near-optimality.
Contribution
It establishes that LOOCV-selected $k$ in $k$-NN regression is asymptotically near-optimal in terms of mean squared error.
Findings
LOOCV-selected $k$ achieves near-minimal mean squared error.
Theoretical proof of the near-optimality of LOOCV in $k$-NN.
Supports the practical use of LOOCV for choosing $k$.
Abstract
The -nearest neighbor algorithm (-NN) is a widely used non-parametric method for classification and regression. We study the mean squared error of the -NN estimator when is chosen by leave-one-out cross-validation (LOOCV). Although it was known that this choice of is asymptotically consistent, it was not known previously that it is an optimal . We show, with high probability, the mean squared error of this estimator is close to the minimum mean squared error using the -NN estimate, where the minimum is over all choices of .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Sparse and Compressive Sensing Techniques
