Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models

Zhenxiang Lin; Maryam Haghighat; Will Browne; Dimity Miller

arXiv:2511.22019·cs.CV·December 9, 2025

Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models

Zhenxiang Lin, Maryam Haghighat, Will Browne, Dimity Miller

PDF

Open Access

TL;DR

This paper presents a training-free, post-hoc uncertainty estimation method for vision-language models that uses class-specific probabilistic embeddings to improve error detection, especially under distribution shifts.

Contribution

Introduces a novel, training-free uncertainty estimation technique for contrastive VLMs using probabilistic embeddings based on feature consistency within classes.

Findings

01

Achieves state-of-the-art error detection on multiple datasets.

02

Robust to distribution shifts with minimal training data.

03

Outperforms existing deterministic and probabilistic baselines.

Abstract

Vision-language models (VLMs), such as CLIP, have gained popularity for their strong open vocabulary classification performance, but they are prone to assigning high confidence scores to misclassifications, limiting their reliability in safety-critical applications. We introduce a training-free, post-hoc uncertainty estimation method for contrastive VLMs that can be used to detect erroneous predictions. The key to our approach is to measure visual feature consistency within a class, using feature projection combined with multivariate Gaussians to create class-specific probabilistic embeddings. Our method is VLM-agnostic, requires no fine-tuning, demonstrates robustness to distribution shift, and works effectively with as few as 10 training images per class. Extensive experiments on ImageNet, Flowers102, Food101, EuroSAT and DTD show state-of-the-art error detection performance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques