Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity

Jaeyoon Jung; Yejun Yoon; and Kunwoo Park

arXiv:2604.04692·cs.CL·May 14, 2026

Is a Picture Worth a Thousand Words? Adaptive Multimodal Fact-Checking with Visual Evidence Necessity

Jaeyoon Jung, Yejun Yoon, and Kunwoo Park

PDF

1 Repo

TL;DR

This paper introduces AMuFC, a multimodal fact-checking framework that adaptively uses visual evidence based on necessity, improving accuracy over traditional methods that use visual data indiscriminately.

Contribution

The work presents a novel adaptive multimodal fact-checking approach with a dual-model system to determine when visual evidence is necessary, enhancing verification accuracy.

Findings

01

Incorporating visual evidence necessity improves accuracy.

02

The proposed framework outperforms existing multimodal fact-checking methods.

03

Code and datasets will be publicly released.

Abstract

Automated fact-checking is a crucial task that supports a responsible information ecosystem. While recent research has progressed from text-only to multimodal fact-checking, a prevailing assumption is that incorporating visual evidence universally improves performance. In this work, we challenge this assumption and show that the indiscriminate use of multimodal evidence can reduce accuracy. To address this challenge, we propose AMuFC, a multimodal fact-checking framework that employs two collaborative vision-language models with distinct roles for the adaptive use of visual evidence: an Analyzer determines whether visual evidence is necessary for claim verification, and a Verifier predicts claim veracity conditioned on both the retrieved evidence and the Analyzer's assessment. Experimental results on three datasets show that incorporating the Analyzer's assessment of visual evidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ssu-humane/AMuFC
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.