IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Paul R\"ottger; Musashi Hinck; Valentin Hofmann; Kobi Hackenburg; Valentina Pyatkin; Faeze Brahman; Dirk Hovy

arXiv:2502.08395·cs.CL·September 11, 2025

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Paul R\"ottger, Musashi Hinck, Valentin Hofmann, Kobi Hackenburg, Valentina Pyatkin, Faeze Brahman, Dirk Hovy

PDF

Open Access 3 Datasets 1 Video

TL;DR

IssueBench provides a large, realistic dataset of prompts to measure issue bias in LLMs, revealing common biases and their alignment with US political opinions, aiding in bias detection and mitigation.

Contribution

The paper introduces IssueBench, a comprehensive benchmark with 2.49 million prompts for measuring issue bias in LLM writing assistance, based on real user interactions.

Findings

01

Issue biases are prevalent and persistent across state-of-the-art LLMs.

02

Biases tend to be similar across different models.

03

All models show a bias towards US Democrat opinions on certain issues.

Abstract

Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective on a given issue, which in turn may influence how users think about this issue. So far, it has not been possible to measure which issue biases LLMs manifest in real user interactions, making it difficult to address the risks from biased LLMs. Therefore, we create IssueBench: a set of 2.49m realistic English-language prompts to measure issue bias in LLM writing assistance, which we construct based on 3.9k templates (e.g. "write a blog about") and 212 political issues (e.g. "AI regulation") from real user interactions. Using IssueBench, we show that issue biases are common and persistent in 10 state-of-the-art LLMs. We also show that biases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance· underline

Taxonomy

TopicsArtificial Intelligence in Law

MethodsSparse Evolutionary Training · ALIGN