Efficient Estimation of k for the Nearest Neighbors Class of Methods
Aleksander Lodwich, Faisal Shafait, Thomas Breuel

TL;DR
This paper proposes an efficient systematic method for estimating the optimal number of neighbors (k) in kNN algorithms, reducing the computational expense of traditional cross-validation approaches.
Contribution
It introduces a novel approach that leverages large matrices to estimate k efficiently, minimizing repetitive distance calculations in high-dimensional and large datasets.
Findings
The method reduces computational costs compared to cross-validation.
It demonstrates effective estimation of k in practical scenarios.
The approach is scalable with dataset size and dimensionality.
Abstract
The k Nearest Neighbors (kNN) method has received much attention in the past decades, where some theoretical bounds on its performance were identified and where practical optimizations were proposed for making it work fairly well in high dimensional spaces and on large datasets. From countless experiments of the past it became widely accepted that the value of k has a significant impact on the performance of this method. However, the efficient optimization of this parameter has not received so much attention in literature. Today, the most common approach is to cross-validate or bootstrap this value for all values in question. This approach forces distances to be recomputed many times, even if efficient methods are used. Hence, estimating the optimal k can become expensive even on modern systems. Frequently, this circumstance leads to a sparse manual search of k. In this paper we want to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
