Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments
Yunsung Kim, Mike Hardy, Joseph Tey, Candace Thille, Chris Piech

TL;DR
This paper proposes a stakeholder-centric approach to interpretability in automated educational scoring, introducing four principles and a reference framework that improve transparency without sacrificing accuracy.
Contribution
It develops four interpretability principles and the AnalyticScore framework, advancing transparent automated scoring aligned with stakeholder needs.
Findings
AnalyticScore outperforms many uninterpretable methods in accuracy.
Within 0.06 QWK of state-of-the-art on 10 items from ASAP-SAS.
Featurization behavior aligns well with human annotators.
Abstract
AI-driven automated scoring systems offer scalable and efficient means of evaluating complex student-generated responses. Yet, despite increasing demand for transparency and interpretability, the field has yet to develop a widely accepted solution for interpretable automated scoring to be used in large-scale real-world assessments. This work takes a principled approach to address this challenge. We analyze the needs and potential benefits of interpretable automated scoring for various assessment stakeholder groups and develop four principles of interpretability -- (F)aithfulness, (G)roundedness, (T)raceability, and (I)nterchangeability (FGTI) -- targeted at those needs. To illustrate the feasibility of implementing these principles, we develop the AnalyticScore framework as a reference framework. When applied to the domain of text-based constructed-response scoring, AnalyticScore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
