Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models
Pranav Mahajan, Ihor Kendiukhov, Syed Hussain, Lydia Nottingham

TL;DR
This paper investigates how different elicitation protocols influence the observed preference gap in language models, revealing that protocol design significantly impacts the correlation between stated and revealed preferences.
Contribution
It systematically analyzes the effect of various elicitation protocols on SvR correlation across multiple LMs, highlighting the importance of accounting for indeterminate preferences.
Findings
Allowing neutrality and abstention improves SvR correlation.
Further abstention reduces correlation due to high neutrality.
System prompt steering does not reliably enhance SvR correlation.
Abstract
Recent work identifies a stated-revealed (SvR) preference gap in language models (LMs): a mismatch between the values models endorse and the choices they make in context. Existing evaluations rely heavily on binary forced-choice prompting, which entangles genuine preferences with artifacts of the elicitation protocol. We systematically study how elicitation protocols affect SvR correlation across 24 LMs. Allowing neutrality and abstention during stated preference elicitation allows us to exclude weak signals, substantially improving Spearman's rank correlation () between volunteered stated preferences and forced-choice revealed preferences. However, further allowing abstention in revealed preferences drives to near-zero or negative values due to high neutrality rates. Finally, we find that system prompt steering using stated preferences during revealed preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
