Comparing Human and Automated Evaluation of Open-Ended Student Responses   to Questions of Evolution

Michael J Wiser; Louise S Mead; James J Smith; Robert T Pennock

arXiv:1603.07029·cs.AI·May 8, 2018

Comparing Human and Automated Evaluation of Open-Ended Student Responses to Questions of Evolution

Michael J Wiser, Louise S Mead, James J Smith, Robert T Pennock

PDF

TL;DR

This study compares human and machine learning-based scoring of student responses on evolution questions, finding high reliability but systematic differences, suggesting ML is better suited for formative assessment rather than final grading.

Contribution

It evaluates EvoGrader's effectiveness in scoring student responses and highlights its potential and limitations compared to human scoring.

Findings

01

High inter-rater reliability between human and ML scores

02

Systematic differences suggest ML should be used for formative assessment

03

ML scoring is less suitable for summative evaluation

Abstract

Written responses can provide a wealth of data in understanding student reasoning on a topic. Yet they are time- and labor-intensive to score, requiring many instructors to forego them except as limited parts of summative assessments at the end of a unit or course. Recent developments in Machine Learning (ML) have produced computational methods of scoring written responses for the presence or absence of specific concepts. Here, we compare the scores from one particular ML program -- EvoGrader -- to human scoring of responses to structurally- and content-similar questions that are distinct from the ones the program was trained on. We find that there is substantial inter-rater reliability between the human and ML scoring. However, sufficient systematic differences remain between the human and ML scoring that we advise only using the ML scoring for formative, rather than summative,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.