k-Nearest Neighbour Classification of Datasets with a Family of Distances
Stan Hatko

TL;DR
This paper explores the use of various alternative distance functions in k-nearest neighbor classifiers, extending theoretical guarantees of consistency and demonstrating improved classification accuracy on multiple datasets.
Contribution
It introduces new theoretical results on the universal consistency of k-NN with random and Lipschitz-based distances, and proposes adaptive distance selection methods.
Findings
Universal consistency holds for k-NN with random norms.
Adaptive distance selection improves classification accuracy.
Extensions to quasinorms and locally Lipschitz distances are validated.
Abstract
The -nearest neighbour (-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the -NN classifier. In this thesis we investigate the use of alternative distances for the -NN classifier. We start by introducing some background notions in statistical machine learning. We define the -NN classifier and discuss Stone's theorem and the proof that -NN is universally consistent on the normed space . We then prove that -NN is universally consistent if we take a sequence of random norms (that are independent of the sample and the query) from a family of norms that satisfies a particular boundedness condition. We extend this result by replacing norms with distances based on uniformly locally Lipschitz functions that satisfy certain conditions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Machine Learning and Data Classification
