Explaining the Success of Nearest Neighbor Methods in Prediction
George H. Chen, Devavrat Shah

TL;DR
This paper explains why nearest neighbor methods are effective in prediction by providing theoretical guarantees and practical insights, highlighting their success across various applications and recent advances in scalable search techniques.
Contribution
It offers a comprehensive analysis of the theoretical foundations and practical implementations of nearest neighbor methods, including recent developments in approximate search and learning distance metrics.
Findings
Nonasymptotic statistical guarantees for nearest neighbor prediction.
Successful prediction relies on function smoothness and low decision boundary probability.
Clustering structure enhances prediction success in practical case studies.
Abstract
Many modern methods for prediction leverage nearest neighbor search to find past training examples most similar to a test example, an idea that dates back in text to at least the 11th century and has stood the test of time. This monograph aims to explain the success of these methods, both in theory, for which we cover foundational nonasymptotic statistical guarantees on nearest-neighbor-based regression and classification, and in practice, for which we gather prominent methods for approximate nearest neighbor search that have been essential to scaling prediction systems reliant on nearest neighbor analysis to handle massive datasets. Furthermore, we discuss connections to learning distances for use with nearest neighbor methods, including how random decision trees and ensemble methods learn nearest neighbor structure, as well as recent developments in crowdsourcing and graphons. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification
MethodsFocus
