Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset

Z. Melce H\"us\"unbeyi; Virginie Mouilleron; Leonie Uhling; Daniel Foppe; Tatjana Scheffler; Djam\'e Seddah

arXiv:2601.07985·cs.CL·March 18, 2026

Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked Claim Dataset

Z. Melce H\"us\"unbeyi, Virginie Mouilleron, Leonie Uhling, Daniel Foppe, Tatjana Scheffler, Djam\'e Seddah

PDF

Open Access

TL;DR

This paper presents a comprehensive multilingual, multimodal fact-checking dataset creation pipeline that leverages large language models to extract evidence, generate justifications, and structure claims with visual content, enhancing interpretability and comparability.

Contribution

It introduces a novel pipeline for constructing structured, multilingual, multimodal fact-checking datasets using advanced LLMs, addressing limitations of existing resources.

Findings

01

Pipeline enables detailed comparison of fact-checking across organizations.

02

Generated datasets improve interpretability of fact-checking models.

03

Evaluation shows high quality of evidence extraction and justification generation.

Abstract

The rapid proliferation of misinformation across online platforms underscores the urgent need for robust, up-to-date, explainable, and multilingual fact-checking resources. However, existing datasets are limited in scope, often lacking multimodal evidence, structured annotations, and detailed links between claims, evidence, and verdicts. This paper introduces a comprehensive data collection and processing pipeline that constructs multimodal fact-checking datasets in French and German languages by aggregating ClaimReview feeds, scraping full debunking articles, normalizing heterogeneous claim verdicts, and enriching them with structured metadata and aligned visual content. We used state-of-the-art large language models (LLMs) and multimodal LLMs for (i) evidence extraction under predefined evidence categories and (ii) justification generation that links evidence to verdicts. Evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Computational and Text Analysis Methods · Ethics and Social Impacts of AI