A Shared Standard for Valid Measurement of Generative AI Systems' Capabilities, Risks, and Impacts
Alexandra Chouldechova, Chad Atalla, Solon Barocas, A. Feder Cooper,, Emily Corvi, P. Alex Dow, Jean Garcia-Gathright, Nicholas Pangakis, Stefanie, Reed, Emily Sheng, Dan Vann, Matthew Vogel, Hannah Washington, Hanna Wallach

TL;DR
This paper proposes a shared, theoretically grounded standard for evaluating generative AI systems, aiming to unify diverse practices and improve reliability, validity, and comparability of assessments.
Contribution
It introduces a measurement framework based on social science principles to systematize, operationalize, and apply evaluation concepts, contexts, and metrics for GenAI.
Findings
Framework unifies diverse evaluation practices
Enables better understanding and comparison of evaluations
Supports development of a formalized science of GenAI assessment
Abstract
The valid measurement of generative AI (GenAI) systems' capabilities, risks, and impacts forms the bedrock of our ability to evaluate these systems. We introduce a shared standard for valid measurement that helps place many of the disparate-seeming evaluation practices in use today on a common footing. Our framework, grounded in measurement theory from the social sciences, extends the work of Adcock & Collier (2001) in which the authors formalized valid measurement of concepts in political science via three processes: systematizing background concepts, operationalizing systematized concepts via annotation procedures, and applying those procedures to instances. We argue that valid measurement of GenAI systems' capabilities, risks, and impacts, further requires systematizing, operationalizing, and applying not only the entailed concepts, but also the contexts of interest and the metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Technology Assessment and Management
