ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models

Yahan Tu; Rui Hu; Jitao Sang

arXiv:2409.09318·cs.CL·July 8, 2025

ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models

Yahan Tu, Rui Hu, Jitao Sang

PDF

Open Access

TL;DR

This paper introduces ODE, a dynamic open-set evaluation protocol for hallucinations in multimodal large language models, addressing static benchmark limitations and revealing higher hallucination rates with generated samples.

Contribution

We propose ODE, a novel graph-based, open-set evaluation protocol that enhances hallucination assessment in MLLMs and mitigates data contamination risks.

Findings

01

MLLMs show increased hallucination rates with ODE-generated samples.

02

ODE effectively reveals hallucination patterns and aids in model fine-tuning.

03

The protocol is applicable to both general and specialized scenarios.

Abstract

Hallucination poses a persistent challenge for multimodal large language models (MLLMs). However, existing benchmarks for evaluating hallucinations are generally static, which may overlook the potential risk of data contamination. To address this issue, we propose ODE, an open-set, dynamic protocol designed to evaluate object hallucinations in MLLMs at both the existence and attribute levels. ODE employs a graph-based structure to represent real-world object concepts, their attributes, and the distributional associations between them. This structure facilitates the extraction of concept combinations based on diverse distributional criteria, generating varied samples for structured queries that evaluate hallucinations in both generative and discriminative tasks. Through the generation of new samples, dynamic concept combinations, and varied distribution frequencies, ODE mitigates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Machine Learning in Healthcare