Quantifying Generative Media Bias with a Corpus of Real-world and   Generated News Articles

Filip Trhlik; Pontus Stenetorp

arXiv:2406.10773·cs.CL·June 18, 2024

Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Filip Trhlik, Pontus Stenetorp

PDF

Open Access 1 Video

TL;DR

This paper introduces a new dataset and framework to quantify political bias in news articles generated by large language models, revealing significant disparities especially in instruction-tuned models and their bias as classifiers.

Contribution

It creates a structured dataset of human and machine-generated news articles and develops methods to analyze political bias in LLMs within journalism.

Findings

01

Instruction-tuned LLMs exhibit consistent political bias.

02

Significant differences between base and instruction-tuned models.

03

LLMs display political bias even when used as classifiers.

Abstract

Large language models (LLMs) are increasingly being utilised across a range of tasks and domains, with a burgeoning interest in their application within the field of journalism. This trend raises concerns due to our limited understanding of LLM behaviour in this domain, especially with respect to political bias. Existing studies predominantly focus on LLMs undertaking political questionnaires, which offers only limited insights into their biases and operational nuances. To address this gap, our study establishes a new curated dataset that contains 2,100 human-written articles and utilises their descriptions to generate 56,700 synthetic articles using nine LLMs. This enables us to analyse shifts in properties between human-authored and machine-generated articles, with this study focusing on political bias, detecting it using both supervised models and LLMs. Our findings reveal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles· underline

Taxonomy

TopicsMedia Influence and Politics · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection

MethodsBalanced Selection · Focus