M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

Jiahui Geng; Jonathan Tonglet; Iryna Gurevych

arXiv:2510.23508·cs.CL·January 13, 2026

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

Jiahui Geng, Jonathan Tonglet, Iryna Gurevych

PDF

1 Video

TL;DR

M4FC is a comprehensive, multilingual, multicultural multimodal fact-checking dataset with nearly 5,000 images and 7,000 claims, designed to advance research across six diverse fact-checking tasks.

Contribution

It introduces a large, diverse, real-world dataset for multimodal fact-checking covering multiple languages, cultures, and tasks, with baseline results and analysis.

Findings

01

Baseline results established for all six tasks.

02

Combining intermediate tasks improves verdict prediction performance.

03

Dataset and code are publicly available.

Abstract

Existing real-world datasets for multimodal fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or rely on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent a diverse range of cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake image detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influences verdict prediction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

M4FC: a Multimodal, Multilingual, Multicultural, Multitask real-world Fact-Checking dataset· underline