Prompt Engineering for Scale Development in Generative Psychometrics
Lara Lee Russell-Lasalandra, Hudson Golino

TL;DR
This study uses simulation to show that adaptive prompt engineering significantly improves the quality and validity of personality assessment items generated by large language models, especially with more capable models.
Contribution
It introduces adaptive prompting as a superior strategy for generating psychometric items, demonstrating its effectiveness across different models and settings.
Findings
Adaptive prompting reduces semantic redundancy.
It enhances pre-reduction structural validity.
Benefits increase with model capacity.
Abstract
This Monte Carlo simulation examines how prompt engineering strategies shape the quality of large language model (LLM)--generated personality assessment items within the AI-GENIE framework for generative psychometrics. Item pools targeting the Big Five traits were generated using multiple prompting designs (zero-shot, few-shot, persona-based, and adaptive), model temperatures, and LLMs, then evaluated and reduced using network psychometric methods. Across all conditions, AI-GENIE reliably improved structural validity following reduction, with the magnitude of its incremental contribution inversely related to the quality of the incoming item pool. Prompt design exerted a substantial influence on both pre- and post-reduction item quality. Adaptive prompting consistently outperformed non-adaptive strategies by sharply reducing semantic redundancy, elevating pre-reduction structural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics · Personality Traits and Psychology · Psychometric Methodologies and Testing
