PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay
Rohan Khetan, Ashna Khetan

TL;DR
This paper introduces PoliticsBench, a multi-turn roleplay framework to evaluate political bias in large language models, revealing a tendency towards left-leaning biases and analyzing how these biases evolve during interactions.
Contribution
It presents the first psychometric evaluation of political values in LLMs using multi-stage roleplay, highlighting biases and reasoning styles across eight prominent models.
Findings
Seven models leaned left, Grok leaned right.
Left-leaning models exhibited liberal traits, moderate conservative traits.
Biases showed slight variation across roleplay stages.
Abstract
While Large Language Models (LLMs) are increasingly used as primary sources of information, their potential for political bias may impact their objectivity. Existing benchmarks of LLM social bias primarily evaluate gender and racial stereotypes. When political bias is included, it is typically measured at a coarse level, neglecting the specific values that shape sociopolitical leanings. This study investigates political bias in eight prominent LLMs (Claude, Deepseek, Gemini, GPT, Grok, Llama, Qwen Base, Qwen Instruction-Tuned) using PoliticsBench: a novel multi-turn roleplay framework adapted from the EQ-Bench-v3 psychometric benchmark. We test whether commercially developed LLMs display a systematic left-leaning bias that becomes more pronounced in later stages of multi-stage roleplay. Through twenty evolving scenarios, each model reported its stance and determined its course of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · AI in Service Interactions
