Revisiting the Index Construction of Proximity Graph-Based Approximate   Nearest Neighbor Search

Shuo Yang; Jiadong Xie; Yingfan Liu; Jeffrey Xu Yu; Xiyue Gao; Qianru; Wang; Yanguo Peng; Jiangtao Cui

arXiv:2410.01231·cs.DB·February 18, 2025·2 cites

Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search

Shuo Yang, Jiadong Xie, Yingfan Liu, Jeffrey Xu Yu, Xiyue Gao, Qianru, Wang, Yanguo Peng, Jiangtao Cui

PDF

Open Access

TL;DR

This paper proposes a new framework to accelerate proximity graph construction for approximate nearest neighbor search, significantly reducing build time without sacrificing search accuracy.

Contribution

It introduces a novel pruning-based construction framework for RNG and NSWG, improving efficiency and scalability of PG-based $k$-ANN methods.

Findings

01

Achieves up to 5.6x speedup in graph construction

02

Maintains comparable $k$-ANN search performance

03

Enhances scalability for large high-dimensional datasets

Abstract

Proximity graphs (PG) have gained increasing popularity as the state-of-the-art solutions to $k$ -approximate nearest neighbor ( $k$ -ANN) search on high-dimensional data, which serves as a fundamental function in various fields, e.g., retrieval-augmented generation. Although PG-based approaches have the best $k$ -ANN search performance, their index construction cost is superlinear to the number of points. Such superlinear cost substantially limits their scalability in the era of big data. Hence, the goal of this paper is to accelerate the construction of PG-based methods without compromising their $k$ -ANN search performance. To achieve this goal, two mainstream categories of PG are revisited: relative neighborhood graph (RNG) and navigable small world graph (NSWG). By revisiting their construction process, we find the issues of construction efficiency. To address these issues, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Rough Sets and Fuzzy Logic · Multi-Criteria Decision Making