Piecing It All Together: Verifying Multi-Hop Multimodal Claims

Haoran Wang; Aman Rangapur; Xiongxiao Xu; Yueqing Liang; Haroon; Gharwi; Carl Yang; Kai Shu

arXiv:2411.09547·cs.CL·December 16, 2024

Piecing It All Together: Verifying Multi-Hop Multimodal Claims

Haoran Wang, Aman Rangapur, Xiongxiao Xu, Yueqing Liang, Haroon, Gharwi, Carl Yang, Kai Shu

PDF

Open Access

TL;DR

This paper introduces a new challenging task and dataset for multi-hop multimodal claim verification, requiring models to reason over diverse evidence sources like text, images, and tables to verify claims.

Contribution

The paper presents the MMCV dataset with 15,000 multi-hop multimodal claims, generated with large language models and human feedback, and establishes a human performance benchmark.

Findings

01

State-of-the-art models struggle with multi-hop reasoning in MMCV

02

Increasing reasoning hops decreases model accuracy

03

Human performance sets a benchmark for future improvements

Abstract

Existing claim verification datasets often do not require systems to perform complex reasoning or effectively interpret multimodal evidence. To address this, we introduce a new task: multi-hop multimodal claim verification. This task challenges models to reason over multiple pieces of evidence from diverse sources, including text, images, and tables, and determine whether the combined multimodal evidence supports or refutes a given claim. To study this task, we construct MMCV, a large-scale dataset comprising 15k multi-hop claims paired with multimodal evidence, generated and refined using large language models, with additional input from human feedback. We show that MMCV is challenging even for the latest state-of-the-art multimodal large language models, especially as the number of reasoning hops increases. Additionally, we establish a human performance benchmark on a subset of MMCV.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques