Unified Large Language Models for Misinformation Detection in Low-Resource Linguistic Settings
Muhammad Islam, Javed Ali Khan, Mohammed Abaker, Ali Daud, Azeem Irshad

TL;DR
This paper introduces the first large, expert-verified Urdu fake news detection dataset and evaluates multiple large language models, proposing a unified model that outperforms existing approaches in resource-constrained settings.
Contribution
It provides a publicly available Urdu fake news dataset and a unified LLM model, advancing fake news detection in low-resource languages.
Findings
The dataset is the first of its kind for Urdu fake news detection.
The unified LLM model achieves higher accuracy and F1 scores.
Models are validated through human judgment.
Abstract
The rapid expansion of social media platforms has significantly increased the dissemination of forged content and misinformation, making the detection of fake news a critical area of research. Although fact-checking efforts predominantly focus on English-language news, there is a noticeable gap in resources and strategies to detect news in regional languages, such as Urdu. Advanced Fake News Detection (FND) techniques rely heavily on large, accurately labeled datasets. However, FND in under-resourced languages like Urdu faces substantial challenges due to the scarcity of extensive corpora and the lack of validated lexical resources. Current Urdu fake news datasets are often domain-specific and inaccessible to the public. They also lack human verification, relying mainly on unverified English-to-Urdu translations, which compromises their reliability in practical applications. This study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection
