From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring

Jodi M. Casabianca; Daniel F. McCaffrey; Matthew S. Johnson; Naim Alper; and Vladimir Zubenko

arXiv:2603.19280·cs.CL·March 23, 2026

From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring

Jodi M. Casabianca, Daniel F. McCaffrey, Matthew S. Johnson, Naim Alper, and Vladimir Zubenko

PDF

Open Access

TL;DR

This paper compares feature-based and generative AI scoring methods for constructed responses, emphasizing the need for extensive validity evidence for generative AI due to transparency and consistency concerns.

Contribution

It highlights the differences in validity evidence requirements between feature-based and generative AI scoring systems and proposes best practices for validity evidence collection.

Findings

01

Generative AI scoring requires more validity evidence than feature-based methods.

02

Analysis of student essays demonstrates complexities in validating AI scoring systems.

03

Generative AI's lack of transparency impacts validity evidence collection.

Abstract

The rapid advancements in large language models and generative artificial intelligence (AI) capabilities are making their broad application in the high-stakes testing context more likely. Use of generative AI in the scoring of constructed responses is particularly appealing because it reduces the effort required for handcrafting features in traditional AI scoring and might even outperform those methods. The purpose of this paper is to highlight the differences in the feature-based and generative AI applications in constructed response scoring systems and propose a set of best practices for the collection of validity evidence to support the use and interpretation of constructed response scores from scoring systems using generative AI. We compare the validity evidence needed in scoring systems using human ratings, feature-based natural language processing AI scoring engines, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Computational and Text Analysis Methods