Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations
Leila Pishdad, Ran Zhang, Konstantinos G. Derpanis, Allan Jepson,, Afsaneh Fazly

TL;DR
This paper demonstrates that probabilistic embeddings improve cross-modal image-text retrieval by capturing uncertainty and ambiguity, outperforming traditional point embeddings across various benchmarks.
Contribution
It introduces a simple method replacing point embeddings with probabilistic distributions in image-text matching models, enhancing retrieval performance and uncertainty modeling.
Findings
Probabilistic embeddings outperform point embeddings in retrieval tasks.
Uncertainty captured by probabilistic models correlates with ambiguity in data.
The approach shows consistent improvements across multiple benchmarks.
Abstract
Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are parametrically learned. Our guiding hypothesis is that the uncertainty encoded in the probabilistic embeddings captures the cross-modal ambiguity in the input instances, and that it is through capturing this uncertainty that the probabilistic models can perform better at downstream tasks, such as image-to-text or text-to-image retrieval. Through extensive experiments on standard and new benchmarks, we show a consistent advantage for probabilistic representations in cross-modal retrieval, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
