Searching in one billion vectors: re-rank with source coding
Herv\'e J\'egou (INRIA - IRISA), Romain Tavenard (INRIA - IRISA),, Matthijs Douze (INRIA Rh\^one-Alpes / LJK Laboratoire Jean Kuntzmann, SED),, Laurent Amsaleg (INRIA - IRISA)

TL;DR
This paper introduces a re-ranking method for high-dimensional vector search that refines neighbor hypotheses using source coding techniques, reducing memory usage and improving accuracy in billion-scale datasets.
Contribution
It proposes a novel re-ranking approach based on source coding that enhances existing indexing methods for billion-scale high-dimensional vector search.
Findings
Accurately re-ranks neighbors with minimal memory
Efficiently refines distances using short quantization codes
Demonstrates effectiveness on a new billion-vector dataset
Abstract
Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-verification scheme, which performs exact distance calculation on the short-list of hypotheses, the estimated distances are refined based on short quantization codes, to avoid reading the full vectors from disk. We have released a new public dataset of one billion 128-dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale. Experiments show that our method accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Music and Audio Processing
