Factorization of Fact-Checks for Low Resource Indian Languages
Shivangi Singhal, Rajiv Ratn Shah, Ponnurangam Kumaraguru

TL;DR
This paper introduces FactDRIL, a large-scale multilingual dataset for fact-checking in 11 low-resource Indian languages, aiming to combat fake news proliferation in regional languages.
Contribution
The paper presents the first extensive fact-checking dataset for Indian regional languages, covering multiple media types and domains, to facilitate research in low-resource language fake news detection.
Findings
Collected 31,635 samples across 11 languages over 7 months.
Characterized the dataset's multilingual, multimedia, and multi-domain aspects.
Provided potential use cases for the dataset in fake news detection.
Abstract
The advancement in technology and accessibility of internet to each individual is revolutionizing the real time information. The liberty to express your thoughts without passing through any credibility check is leading to dissemination of fake content in the ecosystem. It can have disastrous effects on both individuals and society as a whole. The amplification of fake news is becoming rampant in India too. Debunked information often gets republished with a replacement description, claiming it to depict some different incidence. To curb such fabricated stories, it is necessary to investigate such deduplicates and false claims made in public. The majority of studies on automatic fact-checking and fake news detection is restricted to English only. But for a country like India where only 10% of the literate population speak English, role of regional languages in spreading falsity cannot be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Hate Speech and Cyberbullying Detection
