FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search
Bing Tian, Haikun Liu, Yuhang Tang, Shihai Xiao, Zhuohui Duan, Xiaofei, Liao, Xuecang Zhang, Junhua Zhu, Yu Zhang

TL;DR
FusionANNS introduces a CPU/GPU cooperative architecture for billion-scale approximate nearest neighbor search, significantly improving query speed and cost efficiency by reducing I/O bottlenecks through innovative multi-tiered indexing and re-ranking techniques.
Contribution
It presents novel CPU/GPU collaborative filtering and re-ranking mechanisms, along with three design innovations, to enhance performance and efficiency in large-scale ANNS systems using minimal hardware.
Findings
Achieves 9.4-13.1X higher QPS than SPANN
Attains 2-4.9X higher QPS than RUMMY
Maintains low latency and high accuracy
Abstract
Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure. Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy for ANNS services. None of modern ANNS systems can address these issues simultaneously. We present FusionANNS, a high-throughput, low-latency, cost-efficient, and high-accuracy ANNS system for billion-scale datasets using SSDs and only one entry-level GPU. The key idea of FusionANNS lies in CPU/GPU collaborative filtering and re-ranking mechanisms, which significantly reduce I/O operations across CPUs, GPU, and SSDs to break through the I/O performance bottleneck. Specifically, we propose three novel designs: (1) multi-tiered indexing to avoid data swapping between CPUs and GPU, (2) heuristic re-ranking to eliminate unnecessary I/Os and computations while guaranteeing high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Image and Video Retrieval Techniques · Data Management and Algorithms
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Concatenated Skip Connection · Max Pooling · U-Net · Self-Supervised Deep Supervision
