Evaluating Alignment of Behavioral Dispositions in LLMs
Amir Taubenfeld, Zorik Gekhman, Lior Nezry, Omri Feldman, Natalie Harris, Shashir Reddy, Romina Stella, Ariel Goldstein, Marian Croak, Yossi Matias, Amir Feder

TL;DR
This paper introduces a framework using adapted psychological questionnaires and Situational Judgment Tests to evaluate how well large language models' behavioral dispositions align with human preferences and tendencies.
Contribution
It develops a novel methodology for assessing LLMs' behavioral dispositions through validated SJTs based on psychological questionnaires, enabling systematic comparison with human preferences.
Findings
LLMs often overconfidence in responses when human consensus is low.
Smaller models deviate significantly from human consensus in high-agreement scenarios.
Some LLMs encourage emotional expression contrary to human consensus on composure.
Abstract
As LLMs integrate into our daily lives, understanding their behavior becomes essential. In this work, we focus on behavioral dispositionsthe underlying tendencies that shape responses in social contextsand introduce a framework to study how closely the dispositions expressed by LLMs align with those of humans. Our approach is grounded in established psychological questionnaires but adapts them for LLMs by transforming human self-report statements into Situational Judgment Tests (SJTs). These SJTs assess behavior by eliciting natural recommendations in realistic user-assistant scenarios. We generate 2,500 SJTs, each validated by three human annotators, and collect preferred actions from 10 annotators per SJT, from a large pool of 550 participants. In a comprehensive study involving 25 LLMs, we find that models often do not reflect the distribution of human preferences: (1) in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Social Robot Interaction and HRI
