MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection Models
Martin Hyben, Sebastian Kula, Jan Cegin, Jakub Simko, Ivan Srba, Robert Moro

TL;DR
The paper introduces MultiCW, a large, balanced multilingual dataset for check-worthy claim detection, and benchmarks various models to evaluate their robustness across languages, domains, and styles.
Contribution
It provides a comprehensive, multilingual benchmark dataset for check-worthy claim detection and evaluates model performance, highlighting the strengths of fine-tuned models over zero-shot LLMs.
Findings
Fine-tuned models outperform zero-shot LLMs in claim classification.
Models show strong generalization across languages, domains, and styles.
MultiCW enables systematic comparison of models for fact-checking tasks.
Abstract
Large Language Models (LLMs) are beginning to reshape how media professionals verify information, yet automated support for detecting check-worthy claims a key step in the fact-checking process remains limited. We introduce the Multi-Check-Worthy (MultiCW) dataset, a balanced multilingual benchmark for check-worthy claim detection spanning 16 languages, 7 topical domains, and 2 writing styles. It consists of 123,722 samples, evenly distributed between noisy (informal) and structured (formal) texts, with balanced representation of check-worthy and non-check-worthy classes across all languages. To probe robustness, we also introduce an equally balanced out-of-distribution evaluation set of 27,761 samples in 4 additional languages. To provide baselines, we benchmark 3 common fine-tuned multilingual transformers against a diverse set of 15 commercial and open LLMs under zero-shot settings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMisinformation and Its Impacts · Computational and Text Analysis Methods · Hate Speech and Cyberbullying Detection
