VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng

TL;DR
This paper introduces a comprehensive evaluation framework for large vision-language models, addressing hallucination issues by measuring coverage and faithfulness across multiple dimensions, and demonstrating improved correlation with human judgments.
Contribution
We propose a multi-dimensional benchmark and a LLM-based two-stage evaluation framework that better assesses hallucinations in LVLMs, surpassing existing metrics in comprehensiveness and human correlation.
Findings
Our benchmark covers objects, attributes, and relations.
The evaluation framework outperforms existing metrics in correlation with human judgments.
Experiments on 10 LVLMs reveal insights into hallucination and informativeness balance.
Abstract
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucinations. Furthermore, current evaluation methods struggle to effectively address the subtle semantic distinctions between model outputs and reference data, as well as the balance between hallucination and informativeness. To address these issues, we introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases. Moreover, we propose a large language model (LLM)-based two-stage evaluation framework that generalizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Media, Religion, Digital Communication · Religion and Sociopolitical Dynamics in Nigeria
