Uncertainty-Aware Evaluation for Vision-Language Models

Vasily Kostumov; Bulat Nutfullin; Oleg Pilipenko; Eugene Ilyushin

arXiv:2402.14418·cs.CV·February 27, 2024·2 cites

Uncertainty-Aware Evaluation for Vision-Language Models

Vasily Kostumov, Bulat Nutfullin, Oleg Pilipenko, Eugene Ilyushin

PDF

Open Access 1 Repo

TL;DR

This paper introduces an uncertainty-aware benchmark for vision-language models, highlighting the importance of uncertainty quantification in evaluating model performance across multiple datasets and revealing misalignments between uncertainty and accuracy.

Contribution

It presents a novel benchmark incorporating uncertainty estimation into VLM evaluation, using conformal prediction to analyze 20+ models on diverse vision-language tasks.

Findings

01

Uncertainty is not aligned with accuracy in VLMs

02

Models with higher accuracy can have higher uncertainty

03

Uncertainty correlates with the language model component

Abstract

Vision-Language Models like GPT-4, LLaVA, and CogVLM have surged in popularity recently due to their impressive performance in several vision-language tasks. Current evaluation methods, however, overlook an essential component: uncertainty, which is crucial for a comprehensive assessment of VLMs. Addressing this oversight, we present a benchmark incorporating uncertainty quantification into evaluating VLMs. Our analysis spans 20+ VLMs, focusing on the multiple-choice Visual Question Answering (VQA) task. We examine models on 5 datasets that evaluate various vision-language capabilities. Using conformal prediction as an uncertainty estimation approach, we demonstrate that the models' uncertainty is not aligned with their accuracy. Specifically, we show that models with the highest accuracy may also have the highest uncertainty, which confirms the importance of measuring it for VLMs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ensec-ai/vlm-uncertainty-bench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Semantic Web and Ontologies · Constraint Satisfaction and Optimization

MethodsLinear Layer · Dropout · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Multi-Head Attention · Layer Normalization · Residual Connection