NELA-GT-2018: A Large Multi-Labelled News Dataset for The Study of   Misinformation in News Articles

Jeppe Norregaard; Benjamin D. Horne; and Sibel Adali

arXiv:1904.01546·cs.CY·April 3, 2019·38 cites

NELA-GT-2018: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

Jeppe Norregaard, Benjamin D. Horne, and Sibel Adali

PDF

Open Access 1 Datasets

TL;DR

This paper introduces NELA-GT-2018, a comprehensive dataset of over 700,000 news articles from diverse sources, annotated with multiple veracity and bias ratings, to facilitate research on misinformation.

Contribution

The paper provides a large, multi-labeled news dataset with source-level veracity assessments, enabling advanced analysis of misinformation and bias in news articles.

Findings

01

Dataset includes 713,000 articles from 194 outlets.

02

Articles are annotated with ratings on reliability, bias, transparency.

03

Dataset supports research on misinformation detection and media analysis.

Abstract

In this paper, we present a dataset of 713k articles collected between 02/2018-11/2018. These articles are collected directly from 194 news and media outlets including mainstream, hyper-partisan, and conspiracy sources. We incorporate ground truth ratings of the sources from 8 different assessment sites covering multiple dimensions of veracity, including reliability, bias, transparency, adherence to journalistic standards, and consumer trust. The NELA-GT-2018 dataset can be found at https://doi.org/10.7910/DVN/ULHLCB.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ioverho/misinfo-general
dataset· 47 dl
47 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Media Influence and Politics