TL;DR
This paper presents EUvsDisinfo, a comprehensive multilingual dataset of pro-Kremlin disinformation and credible news, enabling analysis of disinformation patterns and training models for detection across languages over an eight-year span.
Contribution
It introduces the largest multilingual disinformation dataset with extensive topical and temporal coverage, and demonstrates its use in analyzing patterns and training detection models.
Findings
Disinformation surges before Ukraine invasion in 2022
Language-specific disinformation patterns identified
Effective multilingual detection models trained
Abstract
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
