Effective sampling for large-scale automated writing evaluation systems

Nicholas Dronen; Peter W. Foltz; Kyle Habermehl

arXiv:1412.5659·cs.CL·December 19, 2014

Effective sampling for large-scale automated writing evaluation systems

Nicholas Dronen, Peter W. Foltz, Kyle Habermehl

PDF

Open Access

TL;DR

This paper investigates efficient sampling algorithms to train automated writing evaluation systems with fewer essays, maintaining high accuracy while reducing costs in large-scale educational settings.

Contribution

It introduces novel sampling algorithms that select the most informative essays for training, optimizing model performance with smaller datasets.

Findings

01

Minimized training set sizes while maintaining accuracy

02

Reduced costs of human scoring in large-scale AWE

03

Enhanced efficiency of AWE system training processes

Abstract

Automated writing evaluation (AWE) has been shown to be an effective mechanism for quickly providing feedback to students. It has already seen wide adoption in enterprise-scale applications and is starting to be adopted in large-scale contexts. Training an AWE model has historically required a single batch of several hundred writing examples and human scores for each of them. This requirement limits large-scale adoption of AWE since human-scoring essays is costly. Here we evaluate algorithms for ensuring that AWE models are consistently trained using the most informative essays. Our results show how to minimize training set sizes while maximizing predictive performance, thereby reducing cost without unduly sacrificing accuracy. We conclude with a discussion of how to integrate this approach into large-scale AWE systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Algorithms · Topic Modeling