Ascend-RaBitQ: Heterogeneous NPU-CPU Acceleration of Billion-Scale Similarity Search with 1-bit Quantization
Fujun He, Chuyue Ye, Huaxiang Cai, Zetao Lv, Baolong Cui, Wenru Yan, Chao Zhan, Zigang Zhang, Hao Yi, Jie Xiang, Xiabing Li, Yuhang Gai, Ziyang Zhang, Pengfei Zheng, Yunfei Du

TL;DR
Ascend-RaBitQ introduces a heterogeneous NPU-CPU system for billion-scale vector similarity search, decoupling coarse and fine ranking to optimize accuracy, memory, and performance.
Contribution
It presents the first heterogeneous NPU-CPU optimized IVF-RaBitQ system, enabling efficient billion-scale similarity search by leveraging hardware-specific optimizations.
Findings
Achieves up to 62.8x faster index construction than CPU baseline.
Provides up to 4.6x throughput improvement over fastest CPU implementation.
Demonstrates scalability on multi-NPU systems.
Abstract
Vector similarity search is a critical component of modern AI systems, but traditional CPU-based implementations face fundamental scalability bottlenecks for billion-scale corpora due to prohibitive computational overhead and memory bandwidth limitations. While Neural Processing Units (NPUs) offer orders-of-magnitude higher compute density, existing CPU/GPU-optimized 1-bit RaBitQ quantization implementations cannot be directly ported to NPU architectures due to fundamental hardware mismatches, and homogeneous design paradigms struggle to simultaneously balance accuracy, memory footprint, and performance. This paper presents Ascend-RaBitQ, the first heterogeneous NPU-CPU optimized IVF-RaBitQ system for billion-scale vector search, built on the core insight that decoupling coarse ranking (NPU) from fine ranking (CPU) allows each stage to leverage its optimal hardware, breaking the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
