On the Difficulty of Nearest Neighbor Search
Junfeng He (Columbia University), Sanjiv Kumar (Google Research),, Shih-Fu Chang (Columbia University)

TL;DR
This paper introduces Relative Contrast, a new measure to evaluate the inherent difficulty of approximate nearest neighbor search in datasets, analyzing how data properties influence search complexity and explaining the effectiveness of certain heuristic algorithms.
Contribution
It presents the first concrete difficulty measure for approximate NN search, linking data characteristics to search complexity and analyzing existing heuristics within this framework.
Findings
Relative Contrast quantifies dataset difficulty for NN search.
Difficulty measure influences the complexity of Local Sensitive Hashing.
Existing heuristics are explained as special cases of the measure.
Abstract
Fast approximate nearest neighbor (NN) search in large databases is becoming popular. Several powerful learning-based formulations have been proposed recently. However, not much attention has been paid to a more fundamental question: how difficult is (approximate) nearest neighbor search in a given data set? And which data properties affect the difficulty of nearest neighbor search and how? This paper introduces the first concrete measure called Relative Contrast that can be used to evaluate the influence of several crucial data characteristics such as dimensionality, sparsity, and database size simultaneously in arbitrary normed metric spaces. Moreover, we present a theoretical analysis to prove how the difficulty measure (relative contrast) determines/affects the complexity of Local Sensitive Hashing, a popular approximate NN search method. Relative contrast also provides an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Robotics and Sensor-Based Localization
