A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighborhood Search
Yiping Sun, Yang Shi, Jiaolong Du

TL;DR
This paper presents RTAMS-GANNS, a GPU-based system enabling real-time, parallel approximate nearest neighbor search with dynamic vector insertion, significantly improving latency and supporting large-scale industrial applications.
Contribution
The paper introduces a novel multi-stream GPU architecture with dynamic insertion algorithms, addressing real-time constraints previously unmet by existing systems.
Findings
Reduces latency by up to 80% across datasets.
Supports real-time insertion without blocking execution.
Successfully deployed in large-scale industrial systems.
Abstract
In recent years, Approximate Nearest Neighbor Search (ANNS) has played a pivotal role in modern search and recommendation systems, especially in emerging LLM applications like Retrieval-Augmented Generation. There is a growing exploration into harnessing the parallel computing capabilities of GPUs to meet the substantial demands of ANNS. However, existing systems primarily focus on offline scenarios, overlooking the distinct requirements of online applications that necessitate real-time insertion of new vectors. This limitation renders such systems inefficient for real-world scenarios. Moreover, previous architectures struggled to effectively support real-time insertion due to their reliance on serial execution streams. In this paper, we introduce a novel Real-Time Adaptive Multi-Stream GPU ANNS System (RTAMS-GANNS). Our architecture achieves its objectives through three key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Metaheuristic Optimization Algorithms Research · Video Surveillance and Tracking Methods
MethodsFocus
