Challenging the Validity of Personality Tests for Large Language Models
Tom S\"uhr, Florian E. Dorner, Samira Samadi, Augustin Kelava

TL;DR
This paper demonstrates that personality tests designed for humans are not valid for large language models, as LLM responses deviate systematically and do not align with human personality structures.
Contribution
The study provides empirical evidence that existing human personality assessments are invalid for LLMs, highlighting the need for new evaluation methods.
Findings
LLMs often affirm both sides of reverse-coded items
Prompt variations do not produce clear personality factor separation
Responses to personality tests differ systematically from human responses
Abstract
With large language models (LLMs) like GPT-4 appearing to behave increasingly human-like in text-based interactions, it has become popular to attempt to evaluate personality traits of LLMs using questionnaires originally developed for humans. While reusing measures is a resource-efficient way to evaluate LLMs, careful adaptations are usually required to ensure that assessment results are valid even across human subpopulations. In this work, we provide evidence that LLMs' responses to personality tests systematically deviate from human responses, implying that the results of these tests cannot be interpreted in the same way. Concretely, reverse-coded items ("I am introverted" vs. "I am extraverted") are often both answered affirmatively. Furthermore, variation across prompts designed to "steer" LLMs to simulate particular personality types does not follow the clear separation into five…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention
