Prototype-based Aleatoric Uncertainty Quantification for Cross-modal   Retrieval

Hao Li; Jingkuan Song; Lianli Gao; Xiaosu Zhu; Heng Tao Shen

arXiv:2309.17093·cs.CV·January 17, 2024·6 cites

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a Prototype-based Aleatoric Uncertainty Quantification framework for cross-modal retrieval, enhancing prediction reliability by modeling inherent data ambiguity using prototypes and evidential theories.

Contribution

It proposes a novel uncertainty quantification method that constructs modality-specific prototypes and employs Dempster-Shafer and Subjective Logic theories for trustworthy cross-modal retrieval.

Findings

01

Improves uncertainty estimation accuracy in cross-modal retrieval.

02

Achieves better retrieval performance on benchmark datasets.

03

Provides reliable predictions under data corruption scenarios.

Abstract

Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leolee99/pau
pytorchOfficial

Videos

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsPadé Activation Units