Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape
Benjamin D. Horne, William Dron, Sara Khedr, Sibel Adali

TL;DR
This paper introduces a large, diverse political news dataset with extensive features to facilitate research on media strategies, bias, and misinformation, supporting various analytical use cases.
Contribution
It provides a comprehensive dataset of over 136,000 news articles with computed features, enabling systematic studies of news production and dissemination strategies.
Findings
Dataset includes 136K articles from 92 sources
Features cover bias, persuasion, misinformation
Demonstrates use cases like news characterization and attribution
Abstract
The complexity and diversity of today's media landscape provides many challenges for researchers studying news producers. These producers use many different strategies to get their message believed by readers through the writing styles they employ, by repetition across different media sources with or without attribution, as well as other mechanisms that are yet to be studied deeply. To better facilitate systematic studies in this area, we present a large political news data set, containing over 136K news articles, from 92 news sources, collected over 7 months of 2017. These news sources are carefully chosen to include well-established and mainstream sources, maliciously fake sources, satire sources, and hyper-partisan political blogs. In addition to each article we compute 130 content-based and social media engagement features drawn from a wide range of literature on political bias,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
