Evaluating Large Language Models with Psychometrics
Yuan Li, Yue Huang, Hongyi Wang, Ying Cheng, Xiangliang Zhang, James Zou, Lichao Sun

TL;DR
This paper introduces a psychometric benchmark to evaluate large language models' psychological traits, revealing discrepancies between self-reports and actual responses, and highlighting challenges in adapting human-centric tests for AI.
Contribution
It develops a comprehensive psychometric assessment framework for LLMs, identifying key psychological constructs and evaluating their behaviors across diverse scenarios.
Findings
Discrepancies between LLMs' self-reports and response patterns.
Some human-designed tests are unreliable for LLMs.
Insights into LLMs' psychological trait assessment.
Abstract
Large Language Models (LLMs) have demonstrated exceptional capabilities in solving various tasks, progressively evolving into general-purpose assistants. The increasing integration of LLMs into society has sparked interest in whether they exhibit psychological patterns, and whether these patterns remain consistent across different contexts -- questions that could deepen the understanding of their behaviors. Inspired by psychometrics, this paper presents a {comprehensive benchmark for quantifying psychological constructs of LLMs}, encompassing psychological dimension identification, assessment dataset design, and assessment with results validation. Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets featuring diverse scenarios and item types. We uncover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods
