Annotation and Classification of Sentence-level Revision Improvement
Tazin Afrin, Diane Litman

TL;DR
This paper introduces a new annotated corpus of student essay revisions, demonstrating how machine learning can predict revision quality and showing that combining expert and non-expert data improves model accuracy.
Contribution
The paper presents a novel annotated dataset of revision quality and a machine learning approach that leverages both expert and non-expert revisions for better prediction.
Findings
Blended expert and non-expert revisions improve model performance
Expert data is crucial for predicting low-quality revisions
The corpus enables future research on revision quality assessment
Abstract
Studies of writing revisions rarely focus on revision quality. To address this issue, we introduce a corpus of between-draft revisions of student argumentative essays, annotated as to whether each revision improves essay quality. We demonstrate a potential usage of our annotations by developing a machine learning model to predict revision improvement. With the goal of expanding training data, we also extract revisions from a dataset edited by expert proofreaders. Our results indicate that blending expert and non-expert revisions increases model performance, with expert data particularly important for predicting low-quality revisions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
