Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space
Zhonghan Chen, Ruiyuan Zhang, Xi Zhao, Xiaojun Cheng, Xiaofang Zhou

TL;DR
This study investigates the effectiveness of nearest neighbor search in high-dimensional spaces, especially for text embeddings, revealing they are less affected by the curse of dimensionality and that distance function choice has minimal impact.
Contribution
The paper provides extensive empirical analysis of NNS in high-dimensional embeddings, demonstrating their robustness and practical relevance across various datasets and distance metrics.
Findings
High-dimensional text embeddings are more resilient to the curse of dimensionality.
The choice of distance function has minimal impact on NNS relevance.
Dense vector representations remain effective for retrieval tasks in high dimensions.
Abstract
Dense high dimensional vectors are becoming increasingly vital in fields such as computer vision, machine learning, and large language models (LLMs), serving as standard representations for multimodal data. Now the dimensionality of these vector can exceed several thousands easily. Despite the nearest neighbor search (NNS) over these dense high dimensional vectors have been widely used for retrieval augmented generation (RAG) and many other applications, the effectiveness of NNS in such a high-dimensional space remains uncertain, given the possible challenge caused by the "curse of dimensionality." To address above question, in this paper, we conduct extensive NNS studies with different distance functions, such as distance, distance and angular-distance, across diverse embedding datasets, of varied types, dimensionality and modality. Our aim is to investigate factors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Geographic Information Systems Studies · Advanced Image and Video Retrieval Techniques
