The Curious Case of High-Dimensional Indexing as a File Structure: A Case Study of eCP-FS
Omar Shahbaz Khan, Gylfi {\TH}\'or Gu{\dh}mundsson, Bj\"orn {\TH}\'or J\'onsson

TL;DR
This paper introduces eCP-FS, a file-based implementation of a disk-based approximate nearest-neighbor index, which is more human-readable and portable, with acceptable performance trade-offs, especially in memory-constrained environments.
Contribution
The paper proposes a novel file-structured approach to disk-based ANN indexes, enhancing readability and portability, and evaluates its performance against existing state-of-the-art indexes.
Findings
eCP-FS is slower but competitive with other indexes.
eCP-FS has minimal memory footprint in constrained environments.
The approach improves index transparency and ease of analysis.
Abstract
Modern analytical pipelines routinely deploy multiple deep learning and retrieval models that rely on approximate nearest-neighbor (ANN) indexes to support efficient similarity-based search. While many state-of-the-art ANN-indexes are memory-based (e.g., HNSW and IVF), using multiple ANN indexes creates a competition for limited GPU/CPU memory resources, which in turn necessitates disk-based index structures (e.g., DiskANN or eCP). In typical index implementations, the main component is a complex data structure that is serialized to disk and is read either fully at startup time, for memory-based indexes, or incrementally at query time, for disk-based indexes. To visualize the index structure, or analyze its quality, complex coding is needed that is either embedded in the index implementation or replicates the code that reads the data structure. In this paper, we consider an alternative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
