Revealing Fine-Grained Values and Opinions in Large Language Models

Dustin Wright; Arnav Arora; Nadav Borenstein; Srishti Yadav; Serge Belongie; and Isabelle Augenstein

arXiv:2406.19238·cs.CL·September 1, 2025

Revealing Fine-Grained Values and Opinions in Large Language Models

Dustin Wright, Arnav Arora, Nadav Borenstein, Srishti Yadav, Serge Belongie, and Isabelle Augenstein

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper investigates how large language models reveal embedded values and opinions by analyzing their responses to political statements, uncovering biases and recurrent reasoning patterns through a large-scale, multi-faceted analysis.

Contribution

It introduces a comprehensive dataset of LLM responses to the Political Compass Test with varied prompts and proposes a method to identify recurring tropes in their justifications, revealing biases and reasoning patterns.

Findings

01

Demographic features influence LLM responses, indicating bias.

02

Patterns in justifications are consistent across models and prompts.

03

Disparities exist between closed-form and open responses.

Abstract

Uncovering latent values and opinions embedded in large language models (LLMs) can help identify biases and mitigate potential harm. Recently, this has been approached by prompting LLMs with survey questions and quantifying the stances in the outputs towards morally and politically charged statements. However, the stances generated by LLMs can vary greatly depending on how they are prompted, and there are many ways to argue for or against a given position. In this work, we propose to address this by analysing a large and robust dataset of 156k LLM responses to the 62 propositions of the Political Compass Test (PCT) generated by 6 LLMs using 420 prompt variations. We perform coarse-grained analysis of their generated stances and fine-grained analysis of the plain text justifications for those stances. For fine-grained analysis, we propose to identify tropes in the responses: semantically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

copenlu/llm-pct-tropes
pytorchOfficial

Datasets

copenlu/llm-pct-tropes
dataset· 19 dl
19 dl

Videos

Revealing Fine-Grained Values and Opinions in Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsPerceptual control theoretic architecture