Evaluating Developmental Cognition Capabilities of LLMs
Xiao Xiao, Hayoun Noh, Mar Gonzalez-Franco

TL;DR
This paper introduces the Developmental Sentence Completion Test (DSCT) to evaluate the developmental cognition capabilities of large language models (LLMs) using a new, scalable assessment method based on Kegan's theory.
Contribution
It presents the DSCT as a novel, scalable instrument for assessing developmental signals in LLM responses, bridging a gap in existing evaluation methods.
Findings
LLMs recover simulated persona labels with high accuracy.
Human-LLM agreement on real responses is fair, with stronger within-neighborhood agreement.
Larger, newer models tend to generate higher developmental stage signals.
Abstract
Conversational AI is increasingly personalized around users' preferences, histories, goals, and knowledge, but much less around how users interpret and take up model outputs to construct and understand their reality. We draw on Robert Kegan's constructive-developmental theory as a complementary lens on this dimension. Existing methods for assessing developmental stage in the Keganian tradition rely either on expert interviews that do not scale or on sentence-completion instruments that are proprietary, lengthy, or invasive. To make this perspective tractable for LLM evaluation, we introduce the Developmental Sentence Completion Test (DSCT), a 20-item instrument designed to elicit developmental signal in self-administered text. Throughout, we treat the resulting labels as characterizations of stage-like structure in elicited responses, not as validated person-level developmental stage.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
