Hashing for Similarity Search: A Survey

Jingdong Wang; Heng Tao Shen; Jingkuan Song; and Jianqiu Ji

arXiv:1408.2927·cs.DS·August 14, 2014

Hashing for Similarity Search: A Survey

Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji

PDF

TL;DR

This survey reviews hashing techniques for similarity search, focusing on locality sensitive hashing and learning to hash methods, discussing their design, distance measures, and search schemes.

Contribution

It provides a comprehensive overview of hashing algorithms for similarity search, categorizing them into LSH and learning-based methods, and analyzing their key aspects.

Findings

01

Hashing methods are effective for approximate similarity search.

02

LSH designs hash functions without data distribution knowledge.

03

Learning to hash adapts hash functions based on data distribution.

Abstract

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.