A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
Deng Cai

TL;DR
This paper reevaluates various hashing algorithms for Approximate Nearest Neighbor Search, revealing that simple random-projection-based LSH outperforms more complex methods, challenging previous claims and emphasizing the importance of thorough evaluation.
Contribution
The paper introduces a simple two-level index structure and provides a comprehensive comparison of eleven hashing algorithms, highlighting the underestimated performance of random-projection-based LSH.
Findings
Random-projection-based LSH outperforms other hashing algorithms.
Thorough evaluation reveals previous claims about algorithm performance were inaccurate.
Code release enables reproducibility and fair comparison.
Abstract
Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Caching and Content Delivery
