A Note on Graph-Based Nearest Neighbor Search
Hongya Wang, Zhizheng Wang, Wei Wang, Yingyuan Xiao, Zeng Zhao,, Kaixiang Yang

TL;DR
This paper investigates why graph-based nearest neighbor search algorithms are effective, revealing that local clustering properties significantly influence their efficiency and accuracy.
Contribution
It introduces the concept that local clustering coefficient impacts search performance and analyzes the two-phase search process in relation to the maximum strongly connected component.
Findings
High clustering coefficient correlates with larger maximum SCCs.
The two-phase search algorithm guarantees traversal of the maximum SCC.
Empirical results validate the impact of clustering on search efficiency.
Abstract
Nearest neighbor search has found numerous applications in machine learning, data mining and massive data processing systems. The past few years have witnessed the popularity of the graph-based nearest neighbor search paradigm because of its superiority over the space-partitioning algorithms. While a lot of empirical studies demonstrate the efficiency of graph-based algorithms, not much attention has been paid to a more fundamental question: why graph-based algorithms work so well in practice? And which data property affects the efficiency and how? In this paper, we try to answer these questions. Our insight is that "the probability that the neighbors of a point o tends to be neighbors in the KNN graph" is a crucial data property for query efficiency. For a given dataset, such a property can be qualitatively measured by clustering coefficient of the KNN graph. To show how clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Algorithms and Data Compression
