A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice
Hendrik Fichtenberger, Dennis Rohde

TL;DR
This paper introduces a property testing algorithm for $k$-nearest neighbor graphs that efficiently determines their correctness and can identify inaccurate models faster than traditional methods.
Contribution
The paper develops the first randomized property tester for $k$-NN graphs with proven complexity bounds and empirical validation.
Findings
The tester has complexity $O(\sqrt{n} k^2 / \epsilon^2)$.
It can distinguish $k$-NN graphs from $\epsilon$-far graphs.
Empirical results show it detects inaccurate $k$-NN models faster than building the models.
Abstract
In the -nearest neighborhood model (-NN), we are given a set of points , and we shall answer queries by returning the nearest neighbors of in according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many -NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed -NN is not explicit. We study property testing of -NN graphs in theory and evaluate it empirically: given a point set and a directed graph , is a -NN graph, i.e., every point has outgoing edges to its nearest neighbors, or is it -far from being a -NN graph? Here, -far means that one has to change more than an -fraction of the edges in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Optimization and Search Problems
