A Bi-metric Framework for Fast Similarity Search
Haike Xu, Sandeep Silwal, Piotr Indyk

TL;DR
This paper introduces a bi-metric framework that leverages a cheap proxy metric alongside an expensive ground-truth metric to efficiently achieve high-accuracy nearest neighbor search, both theoretically and empirically.
Contribution
It presents a novel framework for constructing data structures that use only the proxy metric to approximate the ground-truth metric, improving efficiency without sacrificing accuracy.
Findings
The framework guarantees arbitrarily good approximation with a bounded proxy metric error.
Applied to text retrieval, it outperforms re-ranking in accuracy-efficiency tradeoffs.
Theoretical instantiations for DiskANN and Cover Tree demonstrate broad applicability.
Abstract
We propose a new "bi-metric" framework for designing nearest neighbor data structures. Our framework assumes two dissimilarity functions: a ground-truth metric that is accurate but expensive to compute, and a proxy metric that is cheaper but less accurate. In both theory and practice, we show how to construct data structures using only the proxy metric such that the query procedure achieves the accuracy of the expensive metric, while only using a limited number of calls to both metrics. Our theoretical results instantiate this framework for two popular nearest neighbor search algorithms: DiskANN and Cover Tree. In both cases we show that, as long as the proxy metric used to construct the data structure approximates the ground-truth metric up to a bounded factor, our data structure achieves arbitrarily good approximation guarantees with respect to the ground-truth metric. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
A Bi-metric Framework for Fast Similarity Search· youtube
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Data Management and Algorithms
