IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance
Paul R\"ottger, Musashi Hinck, Valentin Hofmann, Kobi Hackenburg, Valentina Pyatkin, Faeze Brahman, Dirk Hovy

TL;DR
IssueBench provides a large, realistic dataset of prompts to measure issue bias in LLMs, revealing common biases and their alignment with US political opinions, aiding in bias detection and mitigation.
Contribution
The paper introduces IssueBench, a comprehensive benchmark with 2.49 million prompts for measuring issue bias in LLM writing assistance, based on real user interactions.
Findings
Issue biases are prevalent and persistent across state-of-the-art LLMs.
Biases tend to be similar across different models.
All models show a bias towards US Democrat opinions on certain issues.
Abstract
Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective on a given issue, which in turn may influence how users think about this issue. So far, it has not been possible to measure which issue biases LLMs manifest in real user interactions, making it difficult to address the risks from biased LLMs. Therefore, we create IssueBench: a set of 2.49m realistic English-language prompts to measure issue bias in LLM writing assistance, which we construct based on 3.9k templates (e.g. "write a blog about") and 212 political issues (e.g. "AI regulation") from real user interactions. Using IssueBench, we show that issue biases are common and persistent in 10 state-of-the-art LLMs. We also show that biases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Law
MethodsSparse Evolutionary Training · ALIGN
