Hashing for Similarity Search: A Survey
Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji

TL;DR
This survey reviews hashing techniques for similarity search, focusing on locality sensitive hashing and learning to hash methods, discussing their design, distance measures, and search schemes.
Contribution
It provides a comprehensive overview of hashing algorithms for similarity search, categorizing them into LSH and learning-based methods, and analyzing their key aspects.
Findings
Hashing methods are effective for approximate similarity search.
LSH designs hash functions without data distribution knowledge.
Learning to hash adapts hash functions based on data distribution.
Abstract
Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
