A validity-guided workflow for robust large language model research in psychology
Zhicheng Lin

TL;DR
This paper proposes a six-stage, validity-guided workflow for conducting robust and reliable research using large language models in psychology, emphasizing validation, transparency, and systematic analysis.
Contribution
It introduces a comprehensive workflow integrating psychometrics and causal inference to improve validity in LLM-based psychological research.
Findings
Validated computational instruments for LLMs in psychology
Demonstrated systematic validation distinguishes genuine phenomena from artifacts
Provided guidelines for transparent and reliable LLM research practices
Abstract
Large language models (LLMs) are rapidly being integrated into psychological research as research tools, evaluation targets, human simulators, and cognitive models. However, recent evidence reveals severe measurement unreliability: Personality assessments collapse under factor analysis, moral preferences reverse with punctuation changes, and theory-of-mind accuracy varies widely with trivial rephrasing. These "measurement phantoms"--statistical artifacts masquerading as psychological phenomena--threaten the validity of a growing body of research. Guided by the dual-validity framework that integrates psychometrics with causal inference, we present a six-stage workflow that scales validity requirements to research ambition--using LLMs to code text requires basic reliability and accuracy, while claims about psychological properties demand comprehensive construct validation. Researchers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
