Calibrating Generative AI to Produce Realistic Essays for Data Augmentation
Edward W. Wolfe, Justin O. Barber

TL;DR
This paper evaluates three large language model prompting strategies for generating realistic essays that maintain quality and can effectively augment data for automated scoring systems.
Contribution
It introduces and compares three prompting approaches to produce high-quality, realistic essays for data augmentation in automated scoring.
Findings
Predict next prompting yields highest agreement on essay scores.
Predict next and sentence strategies best preserve original essay quality.
Predict next and 25 examples strategies generate most realistic essays.
Abstract
Data augmentation can mitigate limited training data in machine-learning automated scoring engines for constructed response items. This study seeks to determine how well three approaches to large language model prompting produce essays that preserve the writing quality of the original essays and produce realistic text for augmenting ASE training datasets. We created simulated versions of student essays, and human raters assigned scores to them and rated the realism of the generated text. The results of the study indicate that the predict next prompting strategy produces the highest level of agreement between human raters regarding simulated essay scores, predict next and sentence strategies best preserve the rated quality of the original essay in the simulated essays, and predict next and 25 examples strategies produce the most realistic text as judged by human raters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Text Readability and Simplification
