SCRum-9: Multilingual Stance Classification over Rumours on Social Media
Yue Li, Jake Vasilakes, Zhixue Zhao, Carolina Scarton

TL;DR
SCRum-9 is a comprehensive multilingual dataset for rumour stance classification on social media, enabling benchmarking of models and exploring synthetic data's role in improving classification accuracy across nine languages.
Contribution
The paper introduces SCRum-9, the largest multilingual rumour stance dataset, and demonstrates how synthetic data can enhance model performance, especially for low-resource languages.
Findings
Large multilingual dataset improves rumour analysis
Synthetic data boosts performance of small models
Model predictions often align with human second-choice labels
Abstract
We introduce SCRum-9, the largest multilingual Stance Classification dataset for Rumour analysis in 9 languages, containing 7,516 tweets from X. SCRum-9 goes beyond existing stance classification datasets by covering more languages, linking examples to more fact-checked claims (2.1k), and including confidence-related annotations from multiple annotators to account for intra- and inter-annotator variability. Annotations were made by at least two native speakers per language, totalling more than 405 hours of annotation and 8,150 dollars in compensation. Further, SCRum-9 is used to benchmark five large language models (LLMs) and two multilingual masked language models (MLMs) in In-Context Learning (ICL) and fine-tuning setups. This paper also innovates by exploring the use of multilingual synthetic data for rumour stance classification, showing that even LLMs with weak ICL performance can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection
