The effects of data size on Automated Essay Scoring engines
Christopher Ormerod, Amir Jafari, Susan Lottridge, Milan Patel, Amy, Harris, and Paul van Wamelen

TL;DR
This paper investigates how data size and quality impact the performance of different Automated Essay Scoring models, including feature-based, RNN, and transformer-based approaches, to guide better training data practices.
Contribution
It compares the effects of data size and quality across three AES paradigms, providing insights for optimizing training data for neural network models in production.
Findings
Neural network models benefit significantly from larger, high-quality data.
Feature-based models are less sensitive to data size variations.
Transformer models outperform other paradigms with sufficient high-quality data.
Abstract
We study the effects of data size and quality on the performance on Automated Essay Scoring (AES) engines that are designed in accordance with three different paradigms; A frequency and hand-crafted feature-based model, a recurrent neural network model, and a pretrained transformer-based language model that is fine-tuned for classification. We expect that each type of model benefits from the size and the quality of the training data in very different ways. Standard practices for developing training data for AES engines were established with feature-based methods in mind, however, since neural networks are increasingly being considered in a production setting, this work seeks to inform us as to how to establish better training data for neural networks that will be used in production.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
