You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
Bangzhao Shu, Lechen Zhang, Minje Choi, Lavinia Dunagan, Lajanugen, Logeswaran, Moontae Lee, Dallas Card, David Jurgens

TL;DR
This study critically assesses the reliability of large language models in responding to psychometric questions, revealing significant inconsistencies and limitations in current prompting methods for understanding model perceptions.
Contribution
The paper introduces a comprehensive dataset and analysis framework to evaluate LLMs' consistency and reliability in answering psychometric instrument questions.
Findings
Most LLMs show low negation consistency.
Minor prompt variations significantly reduce answer accuracy.
Current prompting practices are insufficient for reliable model assessment.
Abstract
The versatility of Large Language Models (LLMs) on natural language understanding tasks has made them popular for research in social sciences. To properly understand the properties and innate personas of LLMs, researchers have performed studies that involve using prompts in the form of questions that ask LLMs about particular opinions. In this study, we take a cautionary step back and examine whether the current format of prompting LLMs elicits responses in a consistent and robust manner. We first construct a dataset that contains 693 questions encompassing 39 different instruments of persona measurement on 115 persona axes. Additionally, we design a set of prompts containing minor variations and examine LLMs' capabilities to generate answers, as well as prompt variations to examine their consistency with respect to content-level variations such as switching the order of response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMental Health Research Topics · Personality Traits and Psychology
MethodsSparse Evolutionary Training
