Fine-Grained Interpretation of Political Opinions in Large Language Models

Jingyu Hu; Mengyue Yang; Mengnan Du; Weiru Liu

arXiv:2506.04774·cs.CL·June 6, 2025

Fine-Grained Interpretation of Political Opinions in Large Language Models

Jingyu Hu, Mengyue Yang, Mengnan Du, Weiru Liu

PDF

Open Access 1 Video

TL;DR

This paper develops a multi-dimensional framework and interpretable vectors to analyze and influence the internal political opinions of large language models, improving transparency and control over their political responses.

Contribution

It introduces a four-dimensional political learning framework and constructs a dataset for fine-grained political concept vector learning, enabling better interpretability and intervention in LLMs' political opinions.

Findings

01

Vectors can disentangle political concept confounds.

02

Vectors show good generalization and robustness in OOD settings.

03

Vectors can be used to intervene and alter LLM responses' political leanings.

Abstract

Studies of LLMs' political opinions mainly rely on evaluations of their open-ended responses. Recent work indicates that there is a misalignment between LLMs' responses and their internal intentions. This motivates us to probe LLMs' internal mechanisms and help uncover their internal political states. Additionally, we found that the analysis of LLMs' political opinions often relies on single-axis concepts, which can lead to concept confounds. In this work, we extend the single-axis to multi-dimensions and apply interpretable representation engineering techniques for more transparent LLM political concept learning. Specifically, we designed a four-dimensional political learning framework and constructed a corresponding dataset for fine-grained political concept vector learning. These vectors can be used to detect and intervene in LLM internals. Experiments are conducted on eight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fine-Grained Interpretation of Political Opinions in Large Language Models· underline

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Computational and Text Analysis Methods · Topic Modeling