Learning to Hash for Indexing Big Data - A Survey
Jun Wang, Wei Liu, Sanjiv Kumar, Shih-Fu Chang

TL;DR
This survey reviews the development of learning-based hashing methods for efficient big data indexing, highlighting their advantages over traditional randomized approaches and discussing recent deep learning techniques.
Contribution
It provides a comprehensive overview of various learning to hash techniques, including unsupervised, semi-supervised, supervised, and deep learning approaches, along with future research directions.
Findings
Learning to hash methods improve search accuracy over traditional methods.
Deep learning-based hashing techniques have shown promising results.
The survey identifies key challenges and future trends in the field.
Abstract
The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, Approximate Nearest Neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Caching and Content Delivery
