On the Relationship between Truth and Political Bias in Language Models

Suyash Fulay; William Brannon; Shrestha Mohanty; Cassandra Overney,; Elinor Poole-Dayan; Deb Roy; Jad Kabbara

arXiv:2409.05283·cs.CL·December 2, 2024

On the Relationship between Truth and Political Bias in Language Models

Suyash Fulay, William Brannon, Shrestha Mohanty, Cassandra Overney,, Elinor Poole-Dayan, Deb Roy, Jad Kabbara

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

This paper investigates how training language models for truthfulness influences their political bias, revealing a tendency towards left-leaning bias and highlighting dataset and model size effects.

Contribution

It provides the first analysis of the relationship between truthfulness and political bias in language models, showing that optimizing for truthfulness can increase political bias.

Findings

01

Reward models trained for truthfulness tend to be left-leaning.

02

Existing open-source reward models also exhibit similar bias.

03

Larger models show a greater degree of political bias.

Abstract

Language model alignment research often attempts to ensure that models are not only helpful and harmless, but also truthful and unbiased. However, optimizing these objectives simultaneously can obscure how improving one aspect might impact the others. In this work, we focus on analyzing the relationship between two concepts essential in both language model alignment and political science: truthfulness and political bias. We train reward models on various popular truthfulness datasets and subsequently evaluate their political bias. Our findings reveal that optimizing reward models for truthfulness on these datasets tends to result in a left-leaning political bias. We also find that existing open-source reward models (i.e., those trained on standard human preference datasets) already show a similar bias and that the bias is larger for larger models. These results raise important questions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sfulay/truth_politics
noneOfficial

Datasets

Videos

On the Relationship between Truth and Political Bias in Language Models· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Natural Language Processing Techniques · Translation Studies and Practices

MethodsFocus