Unsupervised Multi-Index Semantic Hashing

Christian Hansen; Casper Hansen; Jakob Grue Simonsen; Stephen Alstrup,; Christina Lioma

arXiv:2103.14460·cs.IR·March 29, 2021

Unsupervised Multi-Index Semantic Hashing

Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Stephen Alstrup,, Christina Lioma

PDF

1 Repo

TL;DR

This paper introduces MISH, an unsupervised semantic hashing model optimized for multi-index hashing, achieving faster search times while maintaining high effectiveness in document similarity retrieval.

Contribution

MISH is a novel unsupervised hashing approach with training objectives that optimize hash codes for multi-index hashing, enhancing search efficiency without sacrificing accuracy.

Findings

01

MISH outperforms state-of-the-art baselines in speed by over 33%.

02

MISH maintains state-of-the-art effectiveness in document similarity search.

03

The training objectives are model-agnostic and easily integrable into existing models.

Abstract

Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster alternatives exist. One such alternative is multi-index hashing, an approach that constructs a smaller candidate set to search over, which depending on the distribution of the hash codes can lead to sub-linear search time. In this work, we propose Multi-Index Semantic Hashing (MISH), an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We derive novel training objectives, which enable to learn hash codes that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Varyn/MISH
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.