GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing
Xinkui Zhao, Hengxuan Lou, Yifan Zhang, Junjie Dai, Shuiguang Deng, Jianwei Yin

TL;DR
GRAB-ANNS introduces a GPU-native hybrid search index that significantly boosts throughput by rethinking hybrid indexing with a hardware-first, bucket-based approach, enabling efficient large-scale AI search applications.
Contribution
The paper presents a novel GPU-native graph index with a bucket-based memory layout and hybrid graph topology, optimized for high-throughput dynamic hybrid search.
Findings
Achieves up to 240.1x higher query throughput than CPU-based systems.
Supports efficient batched insertions and parallel graph maintenance on GPUs.
Maintains high recall while significantly improving performance.
Abstract
Hybrid search, which jointly optimizes vector similarity and structured predicate filtering, has become a fundamental building block for modern AI-driven systems. While recent predicate-aware ANN indices improve filtering efficiency on CPUs, their performance is increasingly constrained by limited memory bandwidth and parallelism. Although GPUs offer massive parallelism and superior memory bandwidth, directly porting CPU-centric hybrid search algorithms to GPUs leads to severe performance degradation due to architectural mismatches, including irregular memory access, branch divergence, and excessive CPU-GPU synchronization. In this paper, we present GRAB-ANNS, a high-throughput, GPU-native graph index for dynamic hybrid search. Our key insight is to rethink hybrid indexing from a hardware-first perspective. We introduce a bucket-based memory layout that transforms range predicates into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
