Beyond Translation: LLM-Based Data Generation for Multilingual   Fact-Checking

Yi-Ling Chung; Aurora Cobo; Pablo Serna

arXiv:2502.15419·cs.CL·February 24, 2025

Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking

Yi-Ling Chung, Aurora Cobo, Pablo Serna

PDF

1 Repo

TL;DR

This paper introduces MultiSynFact, a large-scale multilingual fact-checking dataset generated using LLMs, external knowledge, and validation, to improve automatic fact-checking across multiple languages including low-resource ones.

Contribution

The paper presents the first large-scale multilingual fact-checking dataset, MultiSynFact, created with an LLM-based pipeline and validation, supporting research in multilingual fact-checking.

Findings

01

MultiSynFact contains 2.2 million claim-source pairs.

02

The dataset improves fact-checking performance across languages.

03

Open-source framework facilitates further research.

Abstract

Robust automatic fact-checking systems have the potential to combat online misinformation at scale. However, most existing research primarily focuses on English. In this paper, we introduce MultiSynFact, the first large-scale multilingual fact-checking dataset containing 2.2M claim-source pairs designed to support Spanish, German, English, and other low-resource languages. Our dataset generation pipeline leverages Large Language Models (LLMs), integrating external knowledge from Wikipedia and incorporating rigorous claim validation steps to ensure data quality. We evaluate the effectiveness of MultiSynFact across multiple models and experimental settings. Additionally, we open-source a user-friendly framework to facilitate further research in multilingual fact-checking and dataset generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Genaios/MultiSynFact
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.