Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
Sheetal Harris, Jinshuo Liu, Hassan Jalil Hadi, Yue Cao

TL;DR
This paper introduces the Ax-to-Grind Urdu dataset, the largest publicly available collection of Urdu fake news, annotated by experts, to advance research in Urdu fake news detection using multilingual models.
Contribution
It provides the first large, manually verified Urdu fake news dataset covering multiple domains, addressing resource scarcity and enabling better model benchmarking.
Findings
Ensemble model achieves high F1-score on the dataset.
Dataset covers news from 2017 to 2023 in Urdu.
Benchmark results demonstrate effectiveness of multilingual models.
Abstract
Misinformation can seriously impact society, affecting anything from public opinion to institutional confidence and the political horizon of a state. Fake News (FN) proliferation on online websites and Online Social Networks (OSNs) has increased profusely. Various fact-checking websites include news in English and barely provide information about FN in regional languages. Thus the Urdu FN purveyors cannot be discerned using factchecking portals. SOTA approaches for Fake News Detection (FND) count upon appropriately labelled and large datasets. FND in regional and resource-constrained languages lags due to the lack of limited-sized datasets and legitimate lexical resources. The previous datasets for Urdu FND are limited-sized, domain-restricted, publicly unavailable and not manually verified where the news is translated from English into Urdu. In this paper, we curate and contribute the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Advanced Malware Detection Techniques
MethodsAttention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Residual Connection · Linear Layer · Dense Connections · Adam · Attention Dropout · Weight Decay · Layer Normalization
