Decoupling Vector Data and Index Storage for Space Efficiency
Yuanming Ren, Juncheng Zhang, Yanjing Ren, Rui Yang, Di Wu, and Patrick P. C. Lee

TL;DR
COMPASS is a storage framework that decouples vector data and index metadata, enabling lossless compression and space savings in large-scale disk-resident graph ANNS systems, with minimal performance impact.
Contribution
It introduces a component-aware compression approach that exploits the distinct compressibility of data and index components, improving storage efficiency and maintaining performance.
Findings
Reduces storage space by up to 58.7% on large datasets.
Achieves improved or competitive search and update performance.
Effectively exploits component-specific compressibility characteristics.
Abstract
Managing large-scale vector datasets with disk-resident graph approximate nearest neighbor search (ANNS) systems incurs substantial storage overhead due to the co-location of vector data and auxiliary index metadata, which prevents the storage layer from exploiting their distinct compressibility. We present COMPASS, a component-aware compressed storage framework for disk-resident graph vector search. Leveraging data-index decoupling as a foundation, COMPASS losslessly compresses each component according to its distinct compressibility characteristics, thereby significantly reducing storage space. It further adapts the search and update paths to preserve their performance under compressed storage layouts. Evaluation on real-world public and proprietary billion-scale datasets shows that COMPASS reduces storage space by up to 58.7%, while delivering improved or competitive search and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
