VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large   Vision-Language Models

Haoyi Qiu; Wenbo Hu; Zi-Yi Dou; Nanyun Peng

arXiv:2404.13874·cs.CL·October 7, 2024

VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive evaluation framework for large vision-language models, addressing hallucination issues by measuring coverage and faithfulness across multiple dimensions, and demonstrating improved correlation with human judgments.

Contribution

We propose a multi-dimensional benchmark and a LLM-based two-stage evaluation framework that better assesses hallucinations in LVLMs, surpassing existing metrics in comprehensiveness and human correlation.

Findings

01

Our benchmark covers objects, attributes, and relations.

02

The evaluation framework outperforms existing metrics in correlation with human judgments.

03

Experiments on 10 LVLMs reveal insights into hallucination and informativeness balance.

Abstract

Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucinations. Furthermore, current evaluation methods struggle to effectively address the subtle semantic distinctions between model outputs and reference data, as well as the balance between hallucination and informativeness. To address these issues, we introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases. Moreover, we propose a large language model (LLM)-based two-stage evaluation framework that generalizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoyiq114/valor
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Media, Religion, Digital Communication · Religion and Sociopolitical Dynamics in Nigeria