Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Zhuojun Gu; Quan Wang; Shuchu Han

arXiv:2506.00751·cs.AI·June 3, 2025

Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?

Zhuojun Gu, Quan Wang, Shuchu Han

PDF

Open Access

TL;DR

This paper investigates the divergence between what large language models state as their preferences and how they actually behave in context, revealing significant variability that impacts trust and ethical deployment.

Contribution

It introduces a formal method to measure preference deviations in LLMs and demonstrates how minor prompt changes can significantly alter model choices across different preference categories.

Findings

01

LLMs often show preference divergence based on prompt format.

02

Minor prompt modifications can pivot LLM decisions.

03

Preference deviations are prevalent across multiple LLMs.

Abstract

Recent advances in Large Language Models (LLMs) highlight the need to align their behaviors with human values. A critical, yet understudied, issue is the potential divergence between an LLM's stated preferences (its reported alignment with general principles) and its revealed preferences (inferred from decisions in contextualized scenarios). Such deviations raise fundamental concerns for the interpretability, trustworthiness, reasoning transparency, and ethical deployment of LLMs, particularly in high-stakes applications. This work formally defines and proposes a method to measure this preference deviation. We investigate how LLMs may activate different guiding principles in specific contexts, leading to choices that diverge from previously stated general principles. Our approach involves crafting a rich dataset of well-designed prompts as a series of forced binary choices and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques