CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
Guangzhi Sun, Potsawee Manakul, Adian Liusie, Kunat Pipatanakul, Chao, Zhang, Phil Woodland, Mark Gales

TL;DR
CrossCheckGPT introduces a universal, reference-free hallucination ranking method for multimodal models, leveraging cross-system consistency to evaluate hallucination robustness across various modalities.
Contribution
It proposes a novel, universal hallucination ranking approach that does not rely on gold references, applicable across different tasks and modalities, and introduces the first audio-visual hallucination benchmark.
Findings
Achieves 98% correlation with human judgments on MHaluBench.
Achieves 89% correlation with human judgments on AVHalluBench.
Demonstrates effectiveness across text, image, and audio-visual domains.
Abstract
Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information. Given the diversity in architectures, training data and instruction tuning techniques, there can be large variations in systems' susceptibility to hallucinations. To assess system hallucination robustness, hallucination ranking approaches have been developed for specific tasks such as image captioning, question answering, summarization, or biography generation. However, these approaches typically compare model outputs to gold-standard references or labels, limiting hallucination benchmarking for new domains. This work proposes "CrossCheckGPT", a reference-free universal hallucination ranking for multimodal foundation models. The core idea of CrossCheckGPT is that the same hallucinated content is unlikely to be generated by different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis
