Uncertainty-based Cross-Modal Retrieval with Probabilistic   Representations

Leila Pishdad; Ran Zhang; Konstantinos G. Derpanis; Allan Jepson,; Afsaneh Fazly

arXiv:2204.09268·cs.LG·April 21, 2022

Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations

Leila Pishdad, Ran Zhang, Konstantinos G. Derpanis, Allan Jepson,, Afsaneh Fazly

PDF

Open Access

TL;DR

This paper demonstrates that probabilistic embeddings improve cross-modal image-text retrieval by capturing uncertainty and ambiguity, outperforming traditional point embeddings across various benchmarks.

Contribution

It introduces a simple method replacing point embeddings with probabilistic distributions in image-text matching models, enhancing retrieval performance and uncertainty modeling.

Findings

01

Probabilistic embeddings outperform point embeddings in retrieval tasks.

02

Uncertainty captured by probabilistic models correlates with ambiguity in data.

03

The approach shows consistent improvements across multiple benchmarks.

Abstract

Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are parametrically learned. Our guiding hypothesis is that the uncertainty encoded in the probabilistic embeddings captures the cross-modal ambiguity in the input instances, and that it is through capturing this uncertainty that the probabilistic models can perform better at downstream tasks, such as image-to-text or text-to-image retrieval. Through extensive experiments on standard and new benchmarks, we show a consistent advantage for probabilistic representations in cross-modal retrieval, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning