SIVF: GPU-Resident IVF Index for Streaming Vector Search
Dongfang Zhao

TL;DR
SIVF introduces a GPU-native, mutable IVF index enabling real-time updates and high throughput for large-scale vector search, significantly reducing latency and improving scalability in streaming scenarios.
Contribution
The paper presents SIVF, a novel GPU-resident IVF index with new data structures and algorithms for in-place mutation, addressing the limitations of existing static GPU IVF designs.
Findings
Reduces deletion latency by orders of magnitude.
Achieves near-linear scalability on a 12-GPU cluster.
Supports high-velocity vector ingestion and deletion at millions per second.
Abstract
GPU-accelerated Inverted File (IVF) index is one of the industry standards for large-scale vector search but relies on static VRAM layouts that hinder real-time mutability. Our benchmark and analysis reveal that existing designs of GPU IVF necessitate expensive CPU-GPU data transfers for index updates, causing system latency to spike from milliseconds to seconds in streaming scenarios. We present SIVF, a GPU-native index that enables high-velocity, in-place mutation via a series of new data structures and algorithms, such as conflict-free slab allocation and coalesced search on non-contiguous memory. SIVF has been implemented and integrated into the open-source vector search library, Faiss. Evaluation against baselines with diverse vector datasets demonstrates that SIVF reduces deletion latency by orders of magnitude compared to the state-of-the-arts. Furthermore, distributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Database Systems and Queries · Graph Theory and Algorithms
