TL;DR
This paper introduces MISH, an unsupervised semantic hashing model optimized for multi-index hashing, achieving faster search times while maintaining high effectiveness in document similarity retrieval.
Contribution
MISH is a novel unsupervised hashing approach with training objectives that optimize hash codes for multi-index hashing, enhancing search efficiency without sacrificing accuracy.
Findings
MISH outperforms state-of-the-art baselines in speed by over 33%.
MISH maintains state-of-the-art effectiveness in document similarity search.
The training objectives are model-agnostic and easily integrable into existing models.
Abstract
Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster alternatives exist. One such alternative is multi-index hashing, an approach that constructs a smaller candidate set to search over, which depending on the distribution of the hash codes can lead to sub-linear search time. In this work, we propose Multi-Index Semantic Hashing (MISH), an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We derive novel training objectives, which enable to learn hash codes that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
