OrchANN: A Unified I/O Orchestration Framework for Skewed Out-of-Core Vector Search
Chengying Huan, Lizheng Chen, Zhengyi Yang, Shaonan Ma, Rong Gu, Renjie Yao, Zhibin Wang, Mingxing Zhang, Fang Xi, Jie Tao, Gang Zhang, Guihai Chen, Chen Tian

TL;DR
OrchANN is a novel out-of-core vector search system that improves performance and efficiency by using a unified I/O orchestration model, adaptive indexing, and multi-level pruning to handle skewed workloads at scale.
Contribution
It introduces a unified I/O orchestration framework with adaptive local indexing and multi-level pruning for skewed out-of-core vector search workloads.
Findings
Outperforms existing systems in QPS and latency.
Reduces SSD accesses significantly.
Achieves up to 17.2x higher QPS and 25.0x lower latency.
Abstract
Approximate nearest neighbor search (ANNS) at billion scale is fundamentally an out-of-core problem: vectors and indexes live on SSD, so performance is dominated by I/O rather than compute. Under skewed semantic embeddings, existing out-of-core systems break down: a uniform local index mismatches cluster scales, static routing misguides queries and inflates the number of probed partitions, and pruning is incomplete at the cluster level and lossy at the vector level, triggering "fetch-to-discard" reranking on raw vectors. We present OrchANN, an out-of-core ANNS engine that uses an I/O orchestration model for unified I/O governance along the route-access-verify pipeline. OrchANN selects a heterogeneous local index per cluster via offline auto-profiling, maintains a query-aware in-memory navigation graph that adapts to skewed workloads, and applies multi-level pruning with geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Caching and Content Delivery
