BanFakeNews: A Dataset for Detecting Fake News in Bangla

Md Zobaer Hossain; Md Ashraful Rahman; Md Saiful Islam; Sudipta Kar

arXiv:2004.08789·cs.CL·April 21, 2020·70 cites

BanFakeNews: A Dataset for Detecting Fake News in Bangla

Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam, Sudipta Kar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new annotated dataset of approximately 50,000 Bangla news articles and develops a benchmark system using NLP techniques to detect fake news in this low-resource language.

Contribution

It provides the first large-scale Bangla fake news dataset and a benchmark system combining linguistic features and neural networks for detection.

Findings

01

The dataset enables effective fake news detection in Bangla.

02

Neural network methods outperform traditional linguistic features.

03

Benchmark results establish a baseline for future research.

Abstract

Observing the damages that can be done by the rapid propagation of fake news in various sectors like politics and finance, automatic identification of fake news using linguistic analysis has drawn the attention of the research community. However, such methods are largely being developed for English where low resource languages remain out of the focus. But the risks spawned by fake and manipulative news are not confined by languages. In this work, we propose an annotated dataset of ~50K news that can be used for building automated fake news detection systems for a low resource language like Bangla. Additionally, we provide an analysis of the dataset and develop a benchmark system with state of the art NLP techniques to identify Bangla fake news. To create this system, we explore traditional linguistic features and neural network based methods. We expect this dataset will be a valuable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Rowan1697/FakeNews
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Sentiment Analysis and Opinion Mining