Self-Taught Hashing for Fast Similarity Search
Dell Zhang, Jun Wang, Deng Cai, Jinsong Lu

TL;DR
This paper introduces Self-Taught Hashing, a method that combines unsupervised and supervised learning to generate binary codes for fast similarity search, effectively handling unseen documents and outperforming existing techniques.
Contribution
The paper proposes a novel two-step approach that first finds optimal binary codes with unsupervised learning and then trains classifiers to predict codes for unseen documents.
Findings
Outperforms state-of-the-art techniques on real-world datasets
Uses binarised Laplacian Eigenmap and linear SVM for high-quality codes
Effectively handles unseen documents in similarity search
Abstract
The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal -bit binary codes for all documents in the given corpus via unsupervised learning, and then train classifiers via supervised learning to predict the -bit code for any query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Information Retrieval and Search Behavior
