A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity
Ildik\'o Pil\'an, Sowmya Vajjala, Elena Volodina

TL;DR
This paper presents a machine learning approach to automatically assess the linguistic complexity of Swedish language learning texts and sentences, enabling better resource selection for learners at different proficiency levels.
Contribution
It introduces the first supervised model for predicting Swedish text difficulty, outperforming traditional readability measures and adapting to sentence-level assessment.
Findings
Achieved 81.3% accuracy at document level
Attained 63.4% accuracy at sentence level with 92% adjacent accuracy
Combining linguistic features improves classification performance
Abstract
Corpora and web texts can become a rich language learning resource if we have a means of assessing whether they are linguistically appropriate for learners at a given proficiency level. In this paper, we aim at addressing this issue by presenting the first approach for predicting linguistic complexity for Swedish second language learning material on a 5-point scale. After showing that the traditional Swedish readability measure, L\"asbarhetsindex (LIX), is not suitable for this task, we propose a supervised machine learning model, based on a range of linguistic features, that can reliably classify texts according to their difficulty level. Our model obtained an accuracy of 81.3% and an F-score of 0.8, which is comparable to the state of the art in English and is considerably higher than previously reported results for other languages. We further studied the utility of our features with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
