Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
Donya Rooein, Paul Rottger, Anastassia Shaitarova, Dirk Hovy

TL;DR
This paper introduces prompt-based metrics leveraging large language models to improve the classification of educational text difficulty, surpassing traditional static metrics like Flesch-Kincaid.
Contribution
It proposes a novel prompt-based approach for measuring text difficulty, demonstrating its effectiveness through user studies and regression experiments.
Findings
Prompt-based metrics outperform static metrics in difficulty classification
LLMs effectively capture complex features of text difficulty
Significant improvement in classification accuracy with prompt-based metrics
Abstract
Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, however, requires teachers to adapt the difficulty of content and explanations to the education level of their students. Even the best LLMs today struggle to do this well. If we want to improve LLMs on this adaptation task, we need to be able to measure adaptation success reliably. However, current Static metrics for text difficulty, like the Flesch-Kincaid Reading Ease score, are known to be crude and brittle. We, therefore, introduce and evaluate a new set of Prompt-based metrics for text difficulty. Based on a user study, we create Prompt-based metrics as inputs for LLMs. They leverage LLM's general language understanding capabilities to capture more abstract and complex features than Static metrics. Regression experiments show that adding our Prompt-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
