Optimal choice of $k$ for $k$-nearest neighbor regression

Mona Azadkia

arXiv:1909.05495·math.ST·February 18, 2020·1 cites

Optimal choice of $k$ for $k$-nearest neighbor regression

Mona Azadkia

PDF

Open Access

TL;DR

This paper proves that selecting the number of neighbors $k$ in $k$-NN regression via leave-one-out cross-validation yields an estimator with mean squared error close to the best possible, confirming its near-optimality.

Contribution

It establishes that LOOCV-selected $k$ in $k$-NN regression is asymptotically near-optimal in terms of mean squared error.

Findings

01

LOOCV-selected $k$ achieves near-minimal mean squared error.

02

Theoretical proof of the near-optimality of LOOCV in $k$-NN.

03

Supports the practical use of LOOCV for choosing $k$.

Abstract

The $k$ -nearest neighbor algorithm ( $k$ -NN) is a widely used non-parametric method for classification and regression. We study the mean squared error of the $k$ -NN estimator when $k$ is chosen by leave-one-out cross-validation (LOOCV). Although it was known that this choice of $k$ is asymptotically consistent, it was not known previously that it is an optimal $k$ . We show, with high probability, the mean squared error of this estimator is close to the minimum mean squared error using the $k$ -NN estimate, where the minimum is over all choices of $k$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Sparse and Compressive Sensing Techniques