TL;DR
This paper reviews the emerging field of LLM psychometrics, which applies psychological measurement principles to evaluate and improve large language models, addressing challenges of human-like understanding and human-centered evaluation.
Contribution
It introduces a structured framework for LLM psychometrics, synthesizing interdisciplinary methods and providing actionable insights for future evaluation paradigms.
Findings
Benchmarking principles are systematically shaped.
Evaluation scope is broadened beyond traditional metrics.
Methodologies are refined for better validation.
Abstract
The advancement of large language models (LLMs) has outpaced traditional evaluation methodologies. This progress presents novel challenges, such as measuring human-like psychological constructs, moving beyond static and task-specific benchmarks, and establishing human-centered evaluation. These challenges intersect with psychometrics, the science of quantifying the intangible aspects of human psychology, such as personality, values, and intelligence. This review paper introduces and synthesizes the emerging interdisciplinary field of LLM Psychometrics, which leverages psychometric instruments, theories, and principles to evaluate, understand, and enhance LLMs. The reviewed literature systematically shapes benchmarking principles, broadens evaluation scopes, refines methodologies, validates results, and advances LLM capabilities. Diverse perspectives are integrated to provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
