The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and   Outlet Popularity

Lukas Gebhard; Felix Hamborg

arXiv:2005.14024·cs.DL·May 29, 2020

The POLUSA Dataset: 0.9M Political News Articles Balanced by Time and Outlet Popularity

Lukas Gebhard, Felix Hamborg

PDF

1 Repo

TL;DR

POLUSA is a large, balanced dataset of 0.9 million US political news articles from 2017-2019, labeled by political leaning, designed to facilitate research on media bias, societal issues, and deep learning applications.

Contribution

The paper introduces POLUSA, a comprehensive, balanced dataset of US political news articles with political labels, addressing limitations of previous datasets for social science and NLP research.

Findings

01

Dataset covers 0.9M articles from 18 outlets

02

Balanced by time and outlet popularity

03

Labels outlets by political leaning

Abstract

News articles covering policy issues are an essential source of information in the social sciences and are also frequently used for other use cases, e.g., to train NLP language models. To derive meaningful insights from the analysis of news, large datasets are required that represent real-world distributions, e.g., with respect to the contained outlets' popularity, topically, or across time. Information on the political leanings of media publishers is often needed, e.g., to study differences in news reporting across the political spectrum, which is one of the prime use cases in the social sciences when studying media bias and related societal issues. Concerning these requirements, existing datasets have major flaws, resulting in redundant and cumbersome effort in the research community for dataset creation. To fill this gap, we present POLUSA, a dataset that represents the online media…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lukasgebhard/Political-News-Filter
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.