Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval

Yufeng Shi; Shujian Yu; Duanquan Xu; Xinge You

arXiv:2209.12491·cs.LG·September 27, 2022

Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval

Yufeng Shi, Shujian Yu, Duanquan Xu, Xinge You

PDF

Open Access

TL;DR

This paper introduces an information-theoretic approach to zero-shot cross-modal retrieval, constructing a common Hamming space without relying on pre-trained NLP embeddings, and demonstrates its effectiveness on benchmark datasets.

Contribution

The paper proposes a novel Information-Theoretic Hashing (ITH) model with adaptive aggregation and semantic-preserving encoding for zero-shot cross-modal retrieval.

Findings

01

ITH outperforms existing methods on three benchmark datasets.

02

The adaptive information aggregation effectively captures intrinsic semantics.

03

Semantic preserving encoding maintains semantic similarity in hash codes.

Abstract

Zero-shot cross-modal retrieval (ZS-CMR) deals with the retrieval problem among heterogenous data from unseen classes. Typically, to guarantee generalization, the pre-defined class embeddings from natural language processing (NLP) models are used to build a common space. In this paper, instead of using an extra NLP model to define a common space beforehand, we consider a totally different way to construct (or learn) a common hamming space from an information-theoretic perspective. We term our model the Information-Theoretic Hashing (ITH), which is composed of two cascading modules: an Adaptive Information Aggregation (AIA) module; and a Semantic Preserving Encoding (SPE) module. Specifically, our AIA module takes the inspiration from the Principle of Relevant Information (PRI) to construct a common space that adaptively aggregates the intrinsic semantics of different modalities of data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications