Quantization for Vector Search under Streaming Updates
Ishaq Aden-Ali, Hakan Ferhatosmanoglu, Alexander Greaves-Tunnell, Nina Mishra, Tal Wagner

TL;DR
This paper introduces a new dynamic quantization method for large-scale vector search that maintains high accuracy and efficiency under streaming data updates, avoiding costly index rebuilds.
Contribution
It provides a theoretical framework and a practical algorithm for data-dependent quantization that remains consistent and accurate with streaming dataset modifications.
Findings
Outperforms baseline methods in large-scale streaming nearest neighbor search
Proves bounded disk I/O per update maintains accuracy guarantees
Develops a practical, adaptive quantization algorithm
Abstract
Large-scale vector databases for approximate nearest neighbor (ANN) search typically store a quantized dataset in main memory for fast access, and full precision data on remote disk. State-of-the-art ANN quantization methods are highly data-dependent, rendering them unable to handle point insertions and deletions. This either leads to degraded search quality over time, or forces costly global rebuilds of the entire search index. In this paper, we formally study data-dependent quantization under streaming dataset updates. We formulate a computation model of limited remote disk access and define a dynamic consistency property that guarantees freshness under updates. We use it to obtain the following results: Theoretically, we prove that static data-dependent quantization can be made dynamic with bounded disk I/O per update while retaining formal accuracy guarantees for ANN search.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Information Retrieval and Search Behavior · Advanced Database Systems and Queries
