CEFR-Based Sentence Difficulty Annotation and Assessment

Yuki Arase; Satoru Uchida; Tomoyuki Kajiwara

arXiv:2210.11766·cs.CL·October 24, 2022

CEFR-Based Sentence Difficulty Annotation and Assessment

Yuki Arase, Satoru Uchida, Tomoyuki Kajiwara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new corpus of 17,000 English sentences annotated with CEFR levels and proposes a sentence-level assessment model that effectively handles unbalanced data, achieving high accuracy in difficulty level prediction.

Contribution

The creation of the CEFR-SP corpus and the development of a novel assessment model for sentence difficulty levels based on CEFR annotations.

Findings

01

Achieved a macro-F1 score of 84.5% in level assessment

02

Outperformed strong baselines in readability assessment

03

Provided a valuable resource for controllable text simplification

Abstract

Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5% in the level assessment, thus outperforming strong baselines employed in readability assessment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yukiar/cefr-sp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification