Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers
Johannes Schneider, Robin Richner, Micha Riser

TL;DR
This paper demonstrates that fine-tuned transformer models can effectively autograde diverse, multilingual short answers with high accuracy, while involving humans to enhance trustworthiness and control error types, addressing ethical concerns.
Contribution
It introduces a large, multilingual dataset for autograding and shows how to improve trustworthiness by involving humans and enabling teacher control over errors.
Findings
Achieved 86.5% accuracy with transformer models on complex datasets.
Involving humans improves autograding accuracy to TA level.
Teachers can control and validate autograder performance effectively.
Abstract
Autograding short textual answers has become much more feasible due to the rise of NLP and the increased availability of question-answer pairs brought about by a shift to online education. Autograding performance is still inferior to human grading. The statistical and black-box nature of state-of-the-art machine learning models makes them untrustworthy, raising ethical concerns and limiting their practical utility. Furthermore, the evaluation of autograding is typically confined to small, monolingual datasets for a specific question type. This study uses a large dataset consisting of about 10 million question-answer pairs from multiple languages covering diverse fields such as math and language, and strong variation in question and answer syntax. We demonstrate the effectiveness of fine-tuning transformer models for autograding for such complex datasets. Our best hyperparameter-tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
