Measuring Psychological Depth in Language Models
Fabrice Harel-Canada, Hanyu Zhou, Sreya Muppalla, Zeynep Yildiz,, Miryung Kim, Amit Sahai, Nanyun Peng

TL;DR
This paper introduces the Psychological Depth Scale (PDS), a new framework for evaluating the emotional and narrative complexity of stories generated by language models, validated through human and automated assessments.
Contribution
The paper presents the PDS framework rooted in literary theory, along with methods to automate it, enabling systematic evaluation of LLMs' storytelling depth from a reader's perspective.
Findings
Humans can reliably evaluate stories using PDS (Krippendorff's alpha 0.72)
GPT-4 with MoP achieves a 0.51 correlation with human judgments
GPT-4 stories match or surpass highly-rated human stories from Reddit
Abstract
Evaluations of creative stories generated by large language models (LLMs) often focus on objective properties of the text, such as its style, coherence, and diversity. While these metrics are indispensable, they do not speak to a story's subjective, psychological impact from a reader's perspective. We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories that provoke emotion, empathy, and engagement. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha). We also explore techniques for automating the PDS to easily scale future analyses. GPT-4o, combined with a novel Mixture-of-Personas (MoP) prompting strategy, achieves an average Spearman correlation of 0.51 with human judgment while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsResidual Connection · Softmax · Layer Normalization · Focus · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention
