Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Tao Huang, Rui Wang, Xiaofei Liu, Yi Qin, Li Duan, Liping Jing

TL;DR
This paper introduces Evidential Uncertainty Quantification (EUQ), a novel method for detecting misbehaviors in large vision-language models by analyzing internal conflict and ignorance, improving reliability in critical applications.
Contribution
The paper proposes EUQ, a fine-grained uncertainty quantification approach that captures both conflict and ignorance, enhancing detection of model misbehaviors over existing methods.
Findings
EUQ outperforms strong baselines across multiple misbehavior categories.
Hallucinations are linked to high internal conflict.
Out-of-distribution failures are associated with high ignorance.
Abstract
%Large vision-language models (LVLMs) have shown substantial advances in multimodal understanding and generation. However, when presented with incompetent or adversarial inputs, they frequently produce unreliable or even harmful content, such as fact hallucinations or dangerous instructions. This misalignment with human expectations, referred to as \emph{misbehaviors} of LVLMs, raises serious concerns for deployment in critical applications. These misbehaviors are found to stem from epistemic uncertainty, specifically either conflicting internal knowledge or the absence of supporting information. However, existing uncertainty quantification methods, which typically capture only overall epistemic uncertainty, have shown limited effectiveness in identifying such issues. To address this gap, we propose Evidential Uncertainty Quantification (EUQ), a fine-grained method that captures both…
Peer Reviews
Decision·ICLR 2026 Poster
- The method seems to be very mathematically rigorous - The paper considered lots of datasets and models.
- The proposed method does not seem to have too much novelty; I can't see why the proposed method is specific to VLM or why it cannot be applied on e.g. BERT, ResNet, LLM. - It is unclear if the method is applicable to closed-source model since the method requires access to logits. - The choice of baseline is little bit confusing, semantic entropy is for uncertainty quantification over free form generation, but many tasks here only require a single word as the output (if I understand correctly
The main strengths of the paper are: 1) Disaggregating the uncertainty into conflict and ignorance uncertainty to interpret the uncertainty of a VLM. This allows the authors to measure uncertainty in different contexts (e.g. hallucination, jailbreaking, out-of-distribution generalization). 2) The experiments are thorough and conducted on multiple model families and expressly evaluated at many scales.
There are no major weaknesses in the paper. However, in Figure 1 in the paper, the authors show an illustrative example of measuring uncertainty in chain-of-thought reasoning. It would be useful to see examples of how the authors' proposed method can identify uncertainty in these reasoning traces. Currently the authors only evaluate their method on benchmarks which often only measure uncertainty on shorter token sequences.
Interesting paper to read as it classifies different types of misbehaviors in VLMs and it is observed that CF/IG can be used to distinguish different types of misbehaviors in VLMs.
1. Although Figure 4 and the appendix visualizations distinguish misbehavior types, there is no deeper linguistic or visual semantic analysis explaining why certain errors yield high CF or IG. 2. Thresholding (which could vary across LVLMs, datasets, or misbehavior categories) would have to be determined externally. Additionally, since the authors propose metrics to evaluate misbehaviors in VLMs and make observations, the size of the datasets and the chosen VLMs (four VLMs with ≤ 8B parameters)
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
