Reading Between the Lines: A dataset and a study on why some texts are tougher than others
Nouran Khallaf, Carlo Eugeni, Serge Sharoff

TL;DR
This paper introduces a dataset and study on text difficulty for individuals with intellectual disabilities, using annotated texts and transformer models to classify and interpret simplification strategies.
Contribution
It presents a novel annotated dataset based on psychological and translation research, and fine-tunes transformer models for classifying text difficulty and simplification strategies.
Findings
Transformer models can predict simplification strategies with reasonable accuracy.
Annotated dataset helps understand text difficulty for cognitively impaired readers.
Interpretability analysis offers insights into model decision-making processes.
Abstract
Our research aims at better understanding what makes a text difficult to read for specific audiences with intellectual disabilities, more specifically, people who have limitations in cognitive functioning, such as reading and understanding skills, an IQ below 70, and challenges in conceptual domains. We introduce a scheme for the annotation of difficulties which is based on empirical research in psychology as well as on research in translation studies. The paper describes the annotated dataset, primarily derived from the parallel texts (standard English and Easy to Read English translations) made available online. we fine-tuned four different pre-trained transformer models to perform the task of multiclass classification to predict the strategies required for simplification. We also investigate the possibility to interpret the decisions of this language model when it is aimed at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
