Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

TL;DR
This paper introduces a Prototype-based Aleatoric Uncertainty Quantification framework for cross-modal retrieval, enhancing prediction reliability by modeling inherent data ambiguity using prototypes and evidential theories.
Contribution
It proposes a novel uncertainty quantification method that constructs modality-specific prototypes and employs Dempster-Shafer and Subjective Logic theories for trustworthy cross-modal retrieval.
Findings
Improves uncertainty estimation accuracy in cross-modal retrieval.
Achieves better retrieval performance on benchmark datasets.
Provides reliable predictions under data corruption scenarios.
Abstract
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsPadé Activation Units
