GermanPartiesQA: Benchmarking Commercial Large Language Models and AI Companions for Political Alignment and Sycophancy
Jan Batzner, Volker Stocker, Stefan Schmid, Gjergji Kasneci

TL;DR
This paper introduces GermanPartiesQA, a benchmark for evaluating commercial large language models' political alignment and bias, revealing their factual limitations, ideological tendencies, and steerability in role-playing scenarios.
Contribution
It presents a new benchmark dataset and evaluation methodology for assessing political alignment and biases in commercial LLMs used in decision support tools.
Findings
LLMs have limited accuracy in representing factual party positions.
Models exhibit consistent ideological alignment patterns.
Models' responses reflect persona-based steerability, not true sycophancy.
Abstract
Large language models (LLMs) are increasingly shaping citizens' information ecosystems. Products incorporating LLMs, such as chatbots and AI Companions, are now widely used for decision support and information retrieval, including in sensitive domains, raising concerns about hidden biases and growing potential to shape individual decisions and public opinion. This paper introduces GermanPartiesQA, a benchmark of 418 political statements from German Voting Advice Applications across 11 elections to evaluate six commercial LLMs. We evaluate their political alignment based on role-playing experiments with political personas. Our evaluation reveals three specific findings: (1) Factual limitations: LLMs show limited ability to accurately generate factual party positions, particularly for centrist parties. (2) Model-specific ideological alignment: We identify consistent alignment patterns and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Model
