Evaluating Generative AI Systems is a Social Science Measurement Challenge
Hanna Wallach, Meera Desai, Nicholas Pangakis, A. Feder Cooper,, Angelina Wang, Solon Barocas, Alexandra Chouldechova, Chad Atalla, Su Lin, Blodgett, Emily Corvi, P. Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu,, Stefanie Reed, Emily Sheng, Dan Vann

TL;DR
This paper proposes a social science-inspired measurement framework for evaluating generative AI systems, emphasizing conceptual clarity, validity, and stakeholder participation to improve assessment rigor.
Contribution
It introduces a four-level measurement framework grounded in social science theory, addressing gaps in current ML evaluation practices.
Findings
Framework clarifies measurement assumptions
Enables diverse stakeholder participation
Improves validity and interpretability of evaluations
Abstract
Across academia, industry, and government, there is an increasing awareness that the measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult. We argue that these measurement tasks are highly reminiscent of measurement tasks found throughout the social sciences. With this in mind, we present a framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, impacts, opportunities, and risks of GenAI systems. The framework distinguishes between four levels: the background concept, the systematized concept, the measurement instrument(s), and the instance-level measurements themselves. This four-level approach differs from the way measurement is typically done in ML, where researchers and practitioners appear to jump straight from background concepts to measurement instruments, with little to no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
