Disk-Resident Graph ANN Search: An Experimental Evaluation
Xiaoyu Chen, Jinxiu Qu, Yitong Song, Shuhang Lu, Huiling Li, Minghui Jiang, Wei Zhou, Jianliang Xu, Xuanhe Zhou, Fan Wu

TL;DR
This paper systematically evaluates disk-resident graph-based approximate nearest neighbor search methods, revealing key trade-offs and providing practical guidelines for system design under various configurations.
Contribution
It offers a unified taxonomy of design components, detailed experimental analysis, and new insights into performance trade-offs for disk-resident graph ANN systems.
Findings
Vector dimensionality impacts component effectiveness.
Layout strategies have low I/O utilization (~15%).
Page size influences system efficiency and feasibility.
Abstract
As data volumes grow while memory capacity remains limited, disk-resident graph-based approximate nearest neighbor (ANN) methods have become a practical alternative to memory-resident designs, shifting the bottleneck from computation to disk I/O. However, since their technical designs diverge widely across storage, layout, and execution paradigms, a systematic understanding of their fundamental performance trade-offs remains elusive. This paper presents a comprehensive experimental study of disk-resident graph-based ANN methods. First, we decompose such systems into five key technical components, i.e., storage strategy, disk layout, cache management, query execution, and update mechanism, and build a unified taxonomy of existing designs across these components. Second, we conduct fine-grained evaluations of representative strategies for each technical component to analyze the trade-offs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Caching and Content Delivery
