Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies   between Model Predictions and Human Responses in VQA

Jian Lan; Diego Frassinelli; Barbara Plank

arXiv:2410.02773·cs.CV·October 7, 2024

Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA

Jian Lan, Diego Frassinelli, Barbara Plank

PDF

Open Access

TL;DR

This paper evaluates how well vision-language models, especially BEiT3, align with the diverse and uncertain responses of humans in VQA tasks, revealing current limitations and proposing calibration improvements.

Contribution

It introduces new metrics to assess model alignment with human response distributions and analyzes the impact of calibration techniques on capturing human uncertainty in VQA.

Findings

01

BEiT3 struggles to model multi-label human response distributions.

02

Calibration techniques aimed at accuracy can worsen alignment with human responses.

03

Calibrating models towards human distributions improves alignment with human uncertainty.

Abstract

Large vision-language models frequently struggle to accurately predict responses provided by multiple human annotators, particularly when those responses exhibit human uncertainty. In this study, we focus on the Visual Question Answering (VQA) task, and we comprehensively evaluate how well the state-of-the-art vision-language models correlate with the distribution of human responses. To do so, we categorize our samples based on their levels (low, medium, high) of human uncertainty in disagreement (HUD) and employ not only accuracy but also three new human-correlated metrics in VQA, to investigate the impact of HUD. To better align models with humans, we also verify the effect of common calibration and human calibration. Our results show that even BEiT3, currently the best model for this task, struggles to capture the multi-label distribution inherent in diverse human responses.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Safety Analysis · Bayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI)