EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin   Disinformation in News Articles

Jo\~ao A. Leite; Olesya Razuvayevskaya; Kalina Bontcheva; Carolina; Scarton

arXiv:2406.12614·cs.CL·September 2, 2024

EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles

Jo\~ao A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina, Scarton

PDF

1 Repo

TL;DR

This paper presents EUvsDisinfo, a comprehensive multilingual dataset of pro-Kremlin disinformation and credible news, enabling analysis of disinformation patterns and training models for detection across languages over an eight-year span.

Contribution

It introduces the largest multilingual disinformation dataset with extensive topical and temporal coverage, and demonstrates its use in analyzing patterns and training detection models.

Findings

01

Disinformation surges before Ukraine invasion in 2022

02

Language-specific disinformation patterns identified

03

Effective multilingual detection models trained

Abstract

This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JAugusto97/euvsdisinfo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.