Similarity Detection Pipeline for Crawling a Topic Related Fake News   Corpus

Inna Vogel; Jeong-Eun Choi; Meghana Meghana

arXiv:2009.13367·cs.CL·March 2, 2021

Similarity Detection Pipeline for Crawling a Topic Related Fake News Corpus

Inna Vogel, Jeong-Eun Choi, Meghana Meghana

PDF

Open Access

TL;DR

This paper introduces a new German fake news corpus, a crawling pipeline for related news, and experiments with sentence embeddings and Bi-LSTM to improve fake news detection accuracy.

Contribution

It presents the first German topic-related fake news corpus, a news crawling pipeline, and evaluates deep learning methods for fake news detection.

Findings

01

Achieved 88% accuracy with SBERT embeddings and Bi-LSTM

02

Provided a publicly available German fake news dataset

03

Demonstrated effectiveness of sentence embeddings in fake news detection

Abstract

Fake news detection is a challenging task aiming to reduce human time and effort to check the truthfulness of news. Automated approaches to combat fake news, however, are limited by the lack of labeled benchmark datasets, especially in languages other than English. Moreover, many publicly available corpora have specific limitations that make them difficult to use. To address this problem, our contribution is threefold. First, we propose a new, publicly available German topic related corpus for fake news detection. To the best of our knowledge, this is the first corpus of its kind. In this regard, we developed a pipeline for crawling similar news articles. As our third contribution, we conduct different learning experiments to detect fake news. The best performance was achieved using sentence level embeddings from SBERT in combination with a Bi-LSTM (k=0.88).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Spam and Phishing Detection

MethodsSentence-BERT