Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks
Timo Freiesleben

TL;DR
This paper argues that establishing construct validity for LLM capabilities requires using nomological networks, which link theoretical constructs to empirical measurements, to improve interpretability of benchmark results.
Contribution
It advocates for adopting the nomological account of construct validity in LLM research, providing a clearer framework for linking capabilities to measurements.
Findings
Nomological account offers a suitable foundation for LLM capability validation.
It avoids ontological commitments of causal accounts.
Provides a framework for assessing reasoning in LLMs.
Abstract
Recent work in machine learning increasingly attributes human-like capabilities such as reasoning or theory of mind to large language models (LLMs) on the basis of benchmark performance. This paper examines this practice through the lens of construct validity, understood as the problem of linking theoretical capabilities to their empirical measurements. It contrasts three influential frameworks: the nomological account developed by Cronbach and Meehl, the inferential account proposed by Messick and refined by Kane, and Borsboom's causal account. I argue that the nomological account provides the most suitable foundation for current LLM capability research. It avoids the strong ontological commitments of the causal account while offering a more substantive framework for articulating construct meaning than the inferential account. I explore the conceptual implications of adopting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
