Fake News Classification in Urdu: A Domain Adaptation Approach for a Low-Resource Language

Muhammad Zain Ali; Bernhard Pfahringer; Tony Smith

arXiv:2512.22778·cs.CL·December 30, 2025

Fake News Classification in Urdu: A Domain Adaptation Approach for a Low-Resource Language

Muhammad Zain Ali, Bernhard Pfahringer, Tony Smith

PDF

Open Access

TL;DR

This paper explores domain adaptation techniques to improve fake news classification in Urdu, a low-resource language, by fine-tuning multilingual models with Urdu news data, resulting in enhanced performance for XLM-RoBERTa.

Contribution

It demonstrates that domain-adaptive pretraining significantly improves fake news detection accuracy in Urdu using multilingual models, especially XLM-RoBERTa.

Findings

01

Domain-adapted XLM-RoBERTa outperforms vanilla models.

02

Domain adaptation improves model performance on Urdu datasets.

03

Mixed results observed for domain-adapted mBERT.

Abstract

Misinformation on social media is a widely acknowledged issue, and researchers worldwide are actively engaged in its detection. However, low-resource languages such as Urdu have received limited attention in this domain. An obvious approach is to utilize a multilingual pretrained language model and fine-tune it for a downstream classification task, such as misinformation detection. However, these models struggle with domain-specific terms, leading to suboptimal performance. To address this, we investigate the effectiveness of domain adaptation before fine-tuning for fake news classification in Urdu, employing a staged training approach to optimize model generalization. We evaluate two widely used multilingual models, XLM-RoBERTa and mBERT, and apply domain-adaptive pretraining using a publicly available Urdu news corpus. Experiments on four publicly available Urdu fake news datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Media Influence and Politics