Finding Relevant Points for Nearest-Neighbor Classification

David Eppstein

arXiv:2110.06163·cs.DS·October 13, 2021

Finding Relevant Points for Nearest-Neighbor Classification

David Eppstein

PDF

Open Access

TL;DR

This paper introduces an efficient algorithm to identify relevant training points in nearest-neighbor classification, reducing the training set while maintaining classification accuracy, and improves upon previous methods in higher dimensions.

Contribution

It presents a simple, improved algorithm for thinning training sets to relevant points using minimum spanning tree and convex hull computations, outperforming prior algorithms in constant dimensions.

Findings

01

Algorithm reduces training set size without loss of classification accuracy.

02

Time complexity bounds are improved over previous methods for dimensions d ≥ 3.

03

Efficiently identifies points whose removal would alter classification outcomes.

Abstract

In nearest-neighbor classification problems, a set of $d$ -dimensional training points are given, each with a known classification, and are used to infer unknown classifications of other points by using the same classification as the nearest training point. A training point is relevant if its omission from the training set would change the outcome of some of these inferences. We provide a simple algorithm for thinning a training set down to its subset of relevant points, using as subroutines algorithms for finding the minimum spanning tree of a set of points and for finding the extreme points (convex hull vertices) of a set of points. The time bounds for our algorithm, in any constant dimension $d \geq 3$ , improve on a previous algorithm for the same problem by Clarkson (FOCS 1994).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification