# Text Readability Assessment for Second Language Learners

**Authors:** Menglin Xia, Ekaterina Kochmar, Ted Briscoe

arXiv: 1906.07580 · 2019-06-19

## TL;DR

This study develops a readability assessment model for second language learners by creating a new CEFR-graded dataset and applying domain adaptation techniques to improve performance on limited L2 data.

## Contribution

It introduces a new CEFR-graded dataset for L2 learners and demonstrates effective domain adaptation methods for readability assessment with limited data.

## Key findings

- Best model achieves 0.797 accuracy
- PCC of 0.938 on learner texts
- Domain adaptation improves performance

## Abstract

This paper addresses the task of readability assessment for the texts aimed at second language (L2) learners. One of the major challenges in this task is the lack of significantly sized level-annotated data. For the present work, we collected a dataset of CEFR-graded texts tailored for learners of English as an L2 and investigated text readability assessment for both native and L2 learners. We applied a generalization method to adapt models trained on larger native corpora to estimate text readability for learners, and explored domain adaptation and self-learning techniques to make use of the native data to improve system performance on the limited L2 data. In our experiments, the best performing model for readability on learner texts achieves an accuracy of 0.797 and PCC of $0.938$.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.07580/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1906.07580/full.md

---
Source: https://tomesphere.com/paper/1906.07580