Foundations for NLP-assisted formative assessment feedback for short-answer tasks in large-enrollment classes
Susan Lloyd, Matthew Beckman, Dennis Pearl, Rebecca Passonneau,, Zhaohui Li, and Zekun Wang

TL;DR
This study explores NLP algorithms to assist formative assessment of short-answer tasks in large classes, demonstrating high inter-rater agreement and proposing cluster analysis for scalable feedback.
Contribution
It introduces a method combining NLP scoring with cluster analysis to facilitate scalable formative assessment in large-enrollment classes.
Findings
High inter-rater agreement with QWK > 0.74
Group consensus Fleiss Kappa = 0.68
Intra-rater agreement QWK = 0.89
Abstract
Research suggests "write-to-learn" tasks improve learning outcomes, yet constructed-response methods of formative assessment become unwieldy with large class sizes. This study evaluates natural language processing algorithms to assist this aim. Six short-answer tasks completed by 1,935 students were scored by several human raters, using a detailed rubric, and an algorithm. Results indicate substantial inter-rater agreement using quadratic weighted kappa for rater pairs (each QWK > 0.74) and group consensus (Fleiss Kappa = 0.68). Additionally, intra-rater agreement was estimated for one rater who had scored 178 responses seven years prior (QWK = 0.89). With compelling rater agreement, the study then pilots cluster analysis of response text toward enabling instructors to ascribe meaning to clusters as a means for scalable formative assessment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStudent Assessment and Feedback · Education and Critical Thinking Development · Educational Assessment and Pedagogy
