LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search
Shurui Zhong, Dingheng Mo, Siqiang Luo

TL;DR
LSM-VEC is a novel disk-based index that combines hierarchical graph indexing with LSM-tree storage to enable efficient, scalable, and dynamic vector search on billion-scale datasets, outperforming existing systems.
Contribution
It introduces a hierarchical graph indexing method integrated with LSM-tree storage, supporting out-of-place updates and adaptive search strategies for large-scale dynamic vector search.
Findings
Outperforms existing disk-based ANN systems in recall and latency.
Reduces memory footprint by over 66.2%.
Demonstrates effectiveness on billion-scale datasets.
Abstract
Vector search underpins modern AI applications by supporting approximate nearest neighbor (ANN) queries over high-dimensional embeddings in tasks like retrieval-augmented generation (RAG), recommendation systems, and multimodal search. Traditional ANN search indices (e.g., HNSW) are limited by memory constraints at large data scale. Disk-based indices such as DiskANN reduce memory overhead but rely on offline graph construction, resulting in costly and inefficient vector updates. The state-of-the-art clustering-based approach SPFresh offers better scalability but suffers from reduced recall due to coarse partitioning. Moreover, SPFresh employs in-place updates to maintain its index structure, limiting its efficiency in handling high-throughput insertions and deletions under dynamic workloads. This paper presents LSM-VEC, a disk-based dynamic vector index that integrates hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
