Evaluating GenAI for Simplifying Texts for Education: Improving Accuracy   and Consistency for Enhanced Readability

Stephanie L. Day; Jacapo Cirica; Steven R. Clapp; Veronika Penkova,; Amy E. Giroux; Abbey Banta; Catherine Bordeau; Poojitha Mutteneni; Ben D.; Sawyer

arXiv:2501.09158·cs.CL·January 17, 2025·2 cites

Evaluating GenAI for Simplifying Texts for Education: Improving Accuracy and Consistency for Enhanced Readability

Stephanie L. Day, Jacapo Cirica, Steven R. Clapp, Veronika Penkova,, Amy E. Giroux, Abbey Banta, Catherine Bordeau, Poojitha Mutteneni, Ben D., Sawyer

PDF

Open Access

TL;DR

This study systematically evaluates the accuracy and consistency of large language models and prompting techniques in simplifying educational texts to various reading levels, highlighting their potential and current limitations.

Contribution

Introduced a generalized evaluation framework and metrics for assessing LLMs and prompting techniques in educational text simplification tasks.

Findings

01

Significant differences in LLM and prompt technique performance across metrics.

02

Variable utility of LLMs in achieving targeted grade levels and maintaining key phrases.

03

Demonstrated the potential of LLMs for automated text simplification despite current shortcomings.

Abstract

Generative artificial intelligence (GenAI) holds great promise as a tool to support personalized learning. Teachers need tools to efficiently and effectively enhance content readability of educational texts so that they are matched to individual students reading levels, while retaining key details. Large Language Models (LLMs) show potential to fill this need, but previous research notes multiple shortcomings in current approaches. In this study, we introduced a generalized approach and metrics for the systematic evaluation of the accuracy and consistency in which LLMs, prompting techniques, and a novel multi-agent architecture to simplify sixty informational reading passages, reducing each from the twelfth grade level down to the eighth, sixth, and fourth grade levels. We calculated the degree to which each LLM and prompting technique accurately achieved the targeted grade level for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification