NELA-GT-2018: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles
Jeppe Norregaard, Benjamin D. Horne, and Sibel Adali

TL;DR
This paper introduces NELA-GT-2018, a comprehensive dataset of over 700,000 news articles from diverse sources, annotated with multiple veracity and bias ratings, to facilitate research on misinformation.
Contribution
The paper provides a large, multi-labeled news dataset with source-level veracity assessments, enabling advanced analysis of misinformation and bias in news articles.
Findings
Dataset includes 713,000 articles from 194 outlets.
Articles are annotated with ratings on reliability, bias, transparency.
Dataset supports research on misinformation detection and media analysis.
Abstract
In this paper, we present a dataset of 713k articles collected between 02/2018-11/2018. These articles are collected directly from 194 news and media outlets including mainstream, hyper-partisan, and conspiracy sources. We incorporate ground truth ratings of the sources from 8 different assessment sites covering multiple dimensions of veracity, including reliability, bias, transparency, adherence to journalistic standards, and consumer trust. The NELA-GT-2018 dataset can be found at https://doi.org/10.7910/DVN/ULHLCB.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Media Influence and Politics
