Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere
Li Ju, Max Andersson, Stina Fredriksson, Edward Gl\"ockner, Andreas Hellander, Ekta Vats, Prashant Singh

TL;DR
This paper introduces AsymVLM, a method to model the asymmetric uncertainty in pre-trained vision-language models by creating probabilistic embeddings on the unit hypersphere, improving uncertainty quantification.
Contribution
It proposes a novel approach to capture asymmetric uncertainty in VLMs by constructing probabilistic embeddings on the hypersphere, addressing limitations of previous methods.
Findings
Probabilistic embeddings improve uncertainty estimation.
AsymVLM outperforms existing post-hoc adaptation methods.
Ablation studies confirm the asymmetry in data uncertainty structures.
Abstract
Vision-language models (VLMs) as foundation models have significantly enhanced performance across a wide range of visual and textual tasks, without requiring large-scale training from scratch for downstream tasks. However, these deterministic VLMs fail to capture the inherent ambiguity and uncertainty in natural language and visual data. Recent probabilistic post-hoc adaptation methods address this by mapping deterministic embeddings onto probability distributions; however, existing approaches do not account for the asymmetric uncertainty structure of the modalities, and the constraint that meaningful deterministic embeddings reside on a unit hypersphere, potentially leading to suboptimal performance. In this paper, we address the asymmetric uncertainty structure inherent in textual and visual data, and propose AsymVLM to build probabilistic embeddings from pre-trained VLMs on the unit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
