Whose Opinions Do Language Models Reflect?
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang,, Tatsunori Hashimoto

TL;DR
This paper introduces a framework and dataset to evaluate how well language models reflect diverse human opinions, revealing significant misalignments with US demographic groups across various topics.
Contribution
The work presents a novel quantitative framework and the OpinionsQA dataset for assessing LM opinion alignment with human demographic groups, highlighting existing biases and gaps.
Findings
Current LMs show substantial opinion misalignment with US demographic groups.
Misalignment is comparable to political divides on climate change.
Explicit steering towards demographic groups does not fully correct opinion misalignments.
Abstract
Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large. In this work, we put forth a quantitative framework to investigate the opinions reflected by LMs -- by leveraging high-quality public opinion polls and their associated human responses. Using this framework, we create OpinionsQA, a new dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation. Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods
