Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval
Yufeng Shi, Shujian Yu, Duanquan Xu, Xinge You

TL;DR
This paper introduces an information-theoretic approach to zero-shot cross-modal retrieval, constructing a common Hamming space without relying on pre-trained NLP embeddings, and demonstrates its effectiveness on benchmark datasets.
Contribution
The paper proposes a novel Information-Theoretic Hashing (ITH) model with adaptive aggregation and semantic-preserving encoding for zero-shot cross-modal retrieval.
Findings
ITH outperforms existing methods on three benchmark datasets.
The adaptive information aggregation effectively captures intrinsic semantics.
Semantic preserving encoding maintains semantic similarity in hash codes.
Abstract
Zero-shot cross-modal retrieval (ZS-CMR) deals with the retrieval problem among heterogenous data from unseen classes. Typically, to guarantee generalization, the pre-defined class embeddings from natural language processing (NLP) models are used to build a common space. In this paper, instead of using an extra NLP model to define a common space beforehand, we consider a totally different way to construct (or learn) a common hamming space from an information-theoretic perspective. We term our model the Information-Theoretic Hashing (ITH), which is composed of two cascading modules: an Adaptive Information Aggregation (AIA) module; and a Semantic Preserving Encoding (SPE) module. Specifically, our AIA module takes the inspiration from the Principle of Relevant Information (PRI) to construct a common space that adaptively aggregates the intrinsic semantics of different modalities of data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
