GPU-Native Approximate Nearest Neighbor Search with IVF-RaBitQ: Fast Index Build and Search
Jifan Shi, Jianyang Gao, James Xia, Tam\'as B\'ela Feh\'er, Cheng Long

TL;DR
This paper introduces IVF-RaBitQ, a GPU-native approximate nearest neighbor search method that combines cluster-based indexing with quantization, achieving fast index building, high recall, and efficient search on high-dimensional data.
Contribution
It presents a scalable GPU-native RaBitQ quantization method and GPU-optimized search schemes, enabling fast index construction and high-performance retrieval in GPU environments.
Findings
2.2x higher QPS than CAGRA at 0.95 recall
7.7x faster index construction than existing methods
Over 2.7x higher throughput than IVF-PQ without reranking
Abstract
Approximate nearest neighbor search (ANNS) on GPUs is gaining increasing popularity for modern retrieval and recommendation workloads that operate over massive high-dimensional vectors. Graph-based indexes deliver high recall and throughput but incur heavy build-time and storage costs. In contrast, cluster-based methods build and scale efficiently yet often need many probes for high recall, straining memory bandwidth and compute. Aiming to simultaneously achieve fast index build, high-throughput search, high recall, and low storage requirement for GPUs, we present IVF-RaBitQ (GPU), a GPU-native ANNS solution that integrates the cluster-based method IVF with RaBitQ quantization into an efficient GPU index build/search pipeline. Specifically, for index build, we develop a scalable GPU-native RaBitQ quantization method that enables fast and accurate low-bit encoding at scale. For search,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Information Retrieval and Search Behavior
