A Shared Standard for Valid Measurement of Generative AI Systems'   Capabilities, Risks, and Impacts

Alexandra Chouldechova; Chad Atalla; Solon Barocas; A. Feder Cooper,; Emily Corvi; P. Alex Dow; Jean Garcia-Gathright; Nicholas Pangakis; Stefanie; Reed; Emily Sheng; Dan Vann; Matthew Vogel; Hannah Washington; Hanna Wallach

arXiv:2412.01934·cs.CY·December 4, 2024

A Shared Standard for Valid Measurement of Generative AI Systems' Capabilities, Risks, and Impacts

Alexandra Chouldechova, Chad Atalla, Solon Barocas, A. Feder Cooper,, Emily Corvi, P. Alex Dow, Jean Garcia-Gathright, Nicholas Pangakis, Stefanie, Reed, Emily Sheng, Dan Vann, Matthew Vogel, Hannah Washington, Hanna Wallach

PDF

Open Access

TL;DR

This paper proposes a shared, theoretically grounded standard for evaluating generative AI systems, aiming to unify diverse practices and improve reliability, validity, and comparability of assessments.

Contribution

It introduces a measurement framework based on social science principles to systematize, operationalize, and apply evaluation concepts, contexts, and metrics for GenAI.

Findings

01

Framework unifies diverse evaluation practices

02

Enables better understanding and comparison of evaluations

03

Supports development of a formalized science of GenAI assessment

Abstract

The valid measurement of generative AI (GenAI) systems' capabilities, risks, and impacts forms the bedrock of our ability to evaluate these systems. We introduce a shared standard for valid measurement that helps place many of the disparate-seeming evaluation practices in use today on a common footing. Our framework, grounded in measurement theory from the social sciences, extends the work of Adcock & Collier (2001) in which the authors formalized valid measurement of concepts in political science via three processes: systematizing background concepts, operationalizing systematized concepts via annotation procedures, and applying those procedures to instances. We argue that valid measurement of GenAI systems' capabilities, risks, and impacts, further requires systematizing, operationalizing, and applying not only the entailed concepts, but also the contexts of interest and the metrics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Technology Assessment and Management