Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models
Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen

TL;DR
This paper critically evaluates existing hallucination benchmarks for large vision-language models, introduces a new quality measurement framework, and proposes a high-quality benchmark to improve reliability and validity in hallucination assessment.
Contribution
It introduces HQM, a framework for assessing hallucination benchmark quality, and HQH, a new high-quality benchmark, addressing evaluation inconsistencies and exposing issues in current methods.
Findings
Existing benchmarks show inconsistent evaluation results.
Current benchmarks often lack alignment with human judgment.
The proposed HQH benchmark demonstrates superior reliability and validity.
Abstract
Despite the outstanding performance in multimodal tasks, Large Vision-Language Models (LVLMs) have been plagued by the issue of hallucination, i.e., generating content that is inconsistent with the corresponding visual inputs. While previous works have proposed various benchmarks to evaluate this issue, the quality of these evaluations remains unverified. We observe that some of these benchmarks may produce inconsistent evaluation results across repeated tests or fail to align with human evaluation. To address this, we propose a Hallucination benchmark Quality Measurement framework (HQM), which leverages specific indicators to assess both reliability and validity. Our empirical analysis using HQM reveals and pinpoints potential evaluation issues in existing benchmarks, exposing a critical gap in current hallucination evaluation. To bridge this gap, we propose HQH, a High-Quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Cell Image Analysis Techniques · Epilepsy research and treatment
