A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Hendrik Fichtenberger; Dennis Rohde

arXiv:1810.05064·cs.LG·December 3, 2018

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Hendrik Fichtenberger, Dennis Rohde

PDF

Open Access

TL;DR

This paper introduces a property testing algorithm for $k$-nearest neighbor graphs that efficiently determines their correctness and can identify inaccurate models faster than traditional methods.

Contribution

The paper develops the first randomized property tester for $k$-NN graphs with proven complexity bounds and empirical validation.

Findings

01

The tester has complexity $O(\sqrt{n} k^2 / \epsilon^2)$.

02

It can distinguish $k$-NN graphs from $\epsilon$-far graphs.

03

Empirical results show it detects inaccurate $k$-NN models faster than building the models.

Abstract

In the $k$ -nearest neighborhood model ( $k$ -NN), we are given a set of points $P$ , and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$ -NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$ -NN is not explicit. We study property testing of $k$ -NN graphs in theory and evaluate it empirically: given a point set $P \subset R^{δ}$ and a directed graph $G = (P, E)$ , is $G$ a $k$ -NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $ϵ$ -far from being a $k$ -NN graph? Here, $ϵ$ -far means that one has to change more than an $ϵ$ -fraction of the edges in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Optimization and Search Problems