Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory
Rongxin Cheng, Yifan Peng, Xingda Wei, Hongrui Xie, Rong Chen, Sijie, Shen, Haibo Chen

TL;DR
This paper investigates the performance and storage trade-offs in billion-scale vector search indexes on SSDs and introduces a new approach leveraging second-tier memory to significantly reduce index size while maintaining high performance.
Contribution
It characterizes the limitations of SSD-based vector indexes and proposes a novel index design optimized for second-tier memory, achieving smaller size and better performance.
Findings
SSD-based indexes have high storage amplification for improved throughput.
Second-tier memory enables smaller, more efficient vector indexes.
Contradictory findings on index performance between SSDs and second-tier memory.
Abstract
Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7 and 1.7, these indexes have to pay a 5.8 storage amplification and 7.7 with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification. This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Neural Networks and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Attention Dropout · Dropout · Residual Connection · Softmax · WordPiece · Linear Layer · Byte Pair Encoding
