Dimensionality-Reduction Techniques for Approximate Nearest Neighbor Search: A Survey and Evaluation
Zeyu Wang, Haoran Xiong, Qitong Wang, Zhenying He, Peng Wang, Themis, Palpanas, Wei Wang

TL;DR
This paper surveys and evaluates six dimensionality-reduction techniques to improve approximate nearest neighbor search efficiency in high-dimensional vector spaces, especially relevant for large-scale machine learning and deep learning applications.
Contribution
It provides a comprehensive review, theoretical analysis, and empirical evaluation of classical and deep learning-based dimensionality-reduction methods for ANNS.
Findings
Deep learning-based techniques show promising accuracy improvements.
Classical methods like PCA and vector quantization are computationally efficient.
Evaluation on six datasets reveals the strengths and limitations of each technique.
Abstract
Approximate Nearest Neighbor Search (ANNS) on high-dimensional vectors has become a fundamental and essential component in various machine learning tasks. Recently, with the rapid development of deep learning models and the applications of Large Language Models (LLMs), the dimensionality of the vectors keeps growing in order to accommodate a richer semantic representation. This poses a major challenge to the ANNS solutions since distance calculation cost in ANNS grows linearly with the dimensionality of vectors. To overcome this challenge, dimensionality-reduction techniques can be leveraged to accelerate the distance calculation in the search process. In this paper, we investigate six dimensionality-reduction techniques that have the potential to improve ANNS solutions, including classical algorithms such as PCA and vector quantization, as well as algorithms based on deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Optimization and Search Problems
