Multimodal Fact-Level Attribution for Verifiable Reasoning

David Wan; Han Wang; Ziyang Wang; Elias Stengel-Eskin; Hyunji Lee; Mohit Bansal

arXiv:2602.11509·cs.CL·May 8, 2026

Multimodal Fact-Level Attribution for Verifiable Reasoning

David Wan, Han Wang, Ziyang Wang, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal

PDF

1 Repo

TL;DR

This paper introduces MuRGAt, a benchmark for evaluating fact-level attribution in multimodal reasoning tasks involving complex inputs like video and audio, highlighting current models' tendency to hallucinate citations.

Contribution

The paper presents MuRGAt, a new benchmark and evaluation framework for assessing fact-level attribution in multimodal models, addressing limitations of previous simplified benchmarks.

Findings

01

Strong models often hallucinate citations despite correct reasoning.

02

Increasing reasoning depth or structured grounding can reduce attribution accuracy.

03

Automatic evaluation correlates well with human judgments.

Abstract

Multimodal large language models (MLLMs) are increasingly used for real-world tasks involving multi-step reasoning and long-form generation, where reliability requires grounding model outputs in heterogeneous input sources and verifying individual factual claims. However, existing multimodal grounding benchmarks and evaluation methods focus on simplified, observation-based scenarios or limited modalities and fail to assess attribution in complex multimodal reasoning. We introduce MuRGAt (Multimodal Reasoning with Grounded Attribution), a benchmark for evaluating fact-level multimodal attribution in settings that require reasoning beyond direct observation. Given inputs spanning video, audio, and other modalities, MuRGAt requires models to generate answers with explicit reasoning and precise citations, where each citation specifies both modality and temporal segments. To enable reliable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meetdavidwan/murgat
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.