A validity-guided workflow for robust large language model research in psychology

Zhicheng Lin

arXiv:2507.04491·cs.HC·July 8, 2025

A validity-guided workflow for robust large language model research in psychology

Zhicheng Lin

PDF

TL;DR

This paper proposes a six-stage, validity-guided workflow for conducting robust and reliable research using large language models in psychology, emphasizing validation, transparency, and systematic analysis.

Contribution

It introduces a comprehensive workflow integrating psychometrics and causal inference to improve validity in LLM-based psychological research.

Findings

01

Validated computational instruments for LLMs in psychology

02

Demonstrated systematic validation distinguishes genuine phenomena from artifacts

03

Provided guidelines for transparent and reliable LLM research practices

Abstract

Large language models (LLMs) are rapidly being integrated into psychological research as research tools, evaluation targets, human simulators, and cognitive models. However, recent evidence reveals severe measurement unreliability: Personality assessments collapse under factor analysis, moral preferences reverse with punctuation changes, and theory-of-mind accuracy varies widely with trivial rephrasing. These "measurement phantoms"--statistical artifacts masquerading as psychological phenomena--threaten the validity of a growing body of research. Guided by the dual-validity framework that integrates psychometrics with causal inference, we present a six-stage workflow that scales validity requirements to research ambition--using LLMs to code text requires basic reliability and accuracy, while claims about psychological properties demand comprehensive construct validation. Researchers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.