Validity Arguments For Constructed Response Scoring Using Generative   Artificial Intelligence Applications

Jodi M. Casabianca; Daniel F. McCaffrey; Matthew S. Johnson; Naim; Alper; and Vladimir Zubenko

arXiv:2501.02334·cs.CL·January 7, 2025

Validity Arguments For Constructed Response Scoring Using Generative Artificial Intelligence Applications

Jodi M. Casabianca, Daniel F. McCaffrey, Matthew S. Johnson, Naim, Alper, and Vladimir Zubenko

PDF

Open Access

TL;DR

This paper examines the validity of using generative AI for scoring constructed responses, comparing it to traditional and feature-based methods, and proposes best practices for validity evidence collection.

Contribution

It highlights the differences in validity evidence requirements for generative AI scoring systems and offers guidelines for supporting score validity in high-stakes testing.

Findings

01

Generative AI scoring requires more extensive validity evidence than feature-based NLP.

02

Constructed response scores from AI can be validated using multiple evidence sources.

03

Combining AI scores from different sources may improve construct coverage.

Abstract

The rapid advancements in large language models and generative artificial intelligence (AI) capabilities are making their broad application in the high-stakes testing context more likely. Use of generative AI in the scoring of constructed responses is particularly appealing because it reduces the effort required for handcrafting features in traditional AI scoring and might even outperform those methods. The purpose of this paper is to highlight the differences in the feature-based and generative AI applications in constructed response scoring systems and propose a set of best practices for the collection of validity evidence to support the use and interpretation of constructed response scores from scoring systems using generative AI. We compare the validity evidence needed in scoring systems using human ratings, feature-based natural language processing AI scoring engines, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTechnology and Data Analysis

MethodsSparse Evolutionary Training