k-Nearest Neighbour Classification of Datasets with a Family of   Distances

Stan Hatko

arXiv:1512.00001·stat.ML·December 2, 2015·2 cites

k-Nearest Neighbour Classification of Datasets with a Family of Distances

Stan Hatko

PDF

Open Access

TL;DR

This paper explores the use of various alternative distance functions in k-nearest neighbor classifiers, extending theoretical guarantees of consistency and demonstrating improved classification accuracy on multiple datasets.

Contribution

It introduces new theoretical results on the universal consistency of k-NN with random and Lipschitz-based distances, and proposes adaptive distance selection methods.

Findings

01

Universal consistency holds for k-NN with random norms.

02

Adaptive distance selection improves classification accuracy.

03

Extensions to quasinorms and locally Lipschitz distances are validated.

Abstract

The $k$ -nearest neighbour ( $k$ -NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the $k$ -NN classifier. In this thesis we investigate the use of alternative distances for the $k$ -NN classifier. We start by introducing some background notions in statistical machine learning. We define the $k$ -NN classifier and discuss Stone's theorem and the proof that $k$ -NN is universally consistent on the normed space $R^{d}$ . We then prove that $k$ -NN is universally consistent if we take a sequence of random norms (that are independent of the sample and the query) from a family of norms that satisfies a particular boundedness condition. We extend this result by replacing norms with distances based on uniformly locally Lipschitz functions that satisfy certain conditions. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Machine Learning and Data Classification