Reducing Political Manipulation with Consistency Training

Long Phan; Devin Kim; Alexander Pan; Alice Blair; Adam Khoja; Dan Hendrycks

arXiv:2605.22771·cs.CL·May 22, 2026

Reducing Political Manipulation with Consistency Training

Long Phan, Devin Kim, Alexander Pan, Alice Blair, Adam Khoja, Dan Hendrycks

PDF

1 Repo

TL;DR

This paper identifies covert political bias in large language models, introduces metrics to measure it, and proposes a reinforcement learning-based training method called Political Consistency Training to mitigate this bias while maintaining helpfulness.

Contribution

It introduces the concept of covert political bias, develops metrics to quantify it, and proposes a novel RL training approach to reduce bias in LLMs.

Findings

01

PCT substantially reduces covert political bias.

02

PCT preserves overall helpfulness of LLMs.

03

The approach generalizes to held-out benchmarks.

Abstract

Large language models (LLMs) exhibit systematic political bias across a variety of sensitive contexts. We find that LLMs handle counterpart topics from opposing political sides asymmetrically. We refer to this phenomenon as covert political bias and identify 7 categories of techniques through which it operates. We propose two metrics for covert bias: Sentiment Consistency measures symmetry in rhetoric and framing across paired political prompts; Helpfulness Consistency measures symmetric depth and engagement. To reduce both types of covert bias, we introduce Political Consistency Training (PCT), an RL training method with two complementary paradigms: Sentiment Consistency Training and Helpfulness Consistency Training. We show that PCT preserves overall helpfulness, substantially reduces covert political bias, and generalizes to held-out benchmarks. We release our work at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://political-manipulation.ai
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.