Probabilistic Embeddings for Cross-Modal Retrieval

Sanghyuk Chun; Seong Joon Oh; Rafael Sampaio de Rezende; Yannis; Kalantidis; Diane Larlus

arXiv:2101.05068·cs.CV·June 15, 2021·5 cites

Probabilistic Embeddings for Cross-Modal Retrieval

Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis, Kalantidis, Diane Larlus

PDF

Open Access 4 Repos

TL;DR

This paper introduces Probabilistic Cross-Modal Embedding (PCME), a novel approach that models image-caption pairs as probabilistic distributions to better handle one-to-many correspondences and improve retrieval performance.

Contribution

The paper proposes PCME, a probabilistic embedding method for cross-modal retrieval that captures uncertainty and improves over deterministic models, with comprehensive ablation studies.

Findings

01

PCME outperforms deterministic models in retrieval tasks.

02

It provides meaningful uncertainty estimates for embeddings.

03

Evaluation on COCO and CUB datasets demonstrates improved performance.

Abstract

Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally make sense. In this paper, we argue that deterministic functions are not sufficiently powerful to capture such one-to-many correspondences. Instead, we propose to use Probabilistic Cross-Modal Embedding (PCME), where samples from the different modalities are represented as probabilistic distributions in the common embedding space. Since common benchmarks such as COCO suffer from non-exhaustive annotations for cross-modal matches, we propose to additionally evaluate retrieval on the CUB dataset, a smaller yet clean…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning