Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

Kylie L. Anglin; Stephanie Milan; Brittney Hernandez; and Claudia Ventura

arXiv:2512.03818·cs.CL·December 4, 2025

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

Kylie L. Anglin, Stephanie Milan, Brittney Hernandez, and Claudia Ventura

PDF

Open Access

TL;DR

This paper empirically assesses prompt engineering strategies to improve large language model classification of psychological constructs, emphasizing the importance of construct definition and task framing for aligning AI outputs with expert judgments.

Contribution

It introduces an empirical framework for optimizing prompt design in psychology classification tasks, highlighting the effectiveness of combining codebook-guided selection with automatic prompt engineering.

Findings

01

Construct definition and task framing are most influential in prompt performance.

02

Few-shot prompts combining empirical and automatic methods yield best alignment with experts.

03

Systematic prompt evaluation improves classification accuracy in theory-driven domains.

Abstract

Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording of the prompt. While literature on prompt engineering is expanding, few studies focus on classification tasks, and even fewer address domains like psychology, where constructs have precise, theory-driven definitions that may not be well represented in pre-training data. We present an empirical framework for optimizing LLM performance for identifying constructs in texts via prompt engineering. We experimentally evaluate five prompting strategies --codebook-guided empirical prompt selection, automatic prompt engineering, persona prompting, chain-of-thought reasoning, and explanatory prompting - with zero-shot and few-shot classification. We find that persona,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Persona Design and Applications · Mental Health via Writing