To Trust Or Not To Trust Your Vision-Language Model's Prediction

Hao Dong; Moru Liu; Jian Liang; Eleni Chatzi; Olga Fink

arXiv:2505.23745·cs.CV·September 25, 2025

To Trust Or Not To Trust Your Vision-Language Model's Prediction

Hao Dong, Moru Liu, Jian Liang, Eleni Chatzi, Olga Fink

PDF

Open Access 1 Repo 3 Reviews

TL;DR

TrustVLM is a training-free framework that enhances the reliability of vision-language models by estimating prediction trustworthiness, significantly reducing misclassification risks in safety-critical applications without retraining.

Contribution

The paper introduces TrustVLM, a novel confidence-scoring method leveraging image embedding space to detect misclassifications, improving trustworthiness of VLMs without additional training.

Findings

01

Achieved up to 51.87% improvement in AURC

02

Demonstrated state-of-the-art detection performance across datasets

03

Validated effectiveness on multiple architectures and VLMs

Abstract

Vision-Language Models (VLMs) have demonstrated strong capabilities in aligning visual and textual modalities, enabling a wide range of applications in multimodal understanding and generation. While they excel in zero-shot and transfer learning scenarios, VLMs remain susceptible to misclassification, often yielding confident yet incorrect predictions. This limitation poses a significant risk in safety-critical domains, where erroneous predictions can lead to severe consequences. In this work, we introduce TrustVLM, a training-free framework designed to address the critical challenge of estimating when VLM's predictions can be trusted. Motivated by the observed modality gap in VLMs and the insight that certain concepts are more distinctly represented in the image embedding space, we propose a novel confidence-scoring function that leverages this space to improve misclassification…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. TrustVLM is a training-free framework designed to evaluate the reliability of VLM predictions. One of its key advantages is that it does not require additional training, which makes it convenient to apply in scenarios where labeled data is limited or unavailable. The framework combines both image-to-text and image-to-image similarities, which allows for a more robust and nuanced design of confidence scores. This combination provides a richer representation of the visual information, enabling

Weaknesses

1. A notable limitation of this method is that it relies on the availability of in-domain data that includes images for all classes to be predicted. Under this assumption, the method can extract and store visual prototypes for each class, which are then used for confidence estimation. However, in many practical scenarios, obtaining such in-domain data for every class may be difficult or infeasible. Moreover, if the training or reference data does not fully cover the diversity of the test data, t

Reviewer 02Rating 6Confidence 4

Strengths

- This work addresses an important task: determining when the predictions of a VLM are likely to be reliable. - Although the proposed method is methodologically straightforward, strong performance is observed across a range of datasets and model backbones. The authors also compare with multiple baselines. The distribution shift experiments with ImageNet are particularly compelling.

Weaknesses

- **Need for finer-grained analysis:** This paper could benefit from additional fine-grained analysis with respect to when the proposed method is most effective (rather than just overall metrics). For example, are there specific classes where misclassification detection performance improves substantially when using the proposed method (as compared to MSP)? What types of characteristics are common among those classes? - **Variance of performance:** The proposed method is likely very sensitive to

Reviewer 03Rating 6Confidence 4

Strengths

- This paper is easy to follow, the motivation is very clear, and the intuition is quite straightforward. - The proposed TrustVLM is training-free and efficient to deploy. It can also be easily adopted by any VLM architectures. - The experimental performance is quite promising.

Weaknesses

- The major concern is missing the comparison with unimodal detection methods. The proposed method combines multimodal information to detect prediction errors; however, in the ablation study, there is no comparison with image-only or text-only detection. In this way, it would be clearer which branch of modality would contribute more to the overall performance improvement. - Moreover, the performance of TrustVLM highly relies on the performance of the employed VLMs; if the VLMs cannot provide hig

Code & Models

Repositories

epfl-imos/trustvlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Ethics and Social Impacts of AI