Resource for Error Analysis in Text Simplification: New Taxonomy and Test Collection

Benjamin Vendeville; Liana Ermakova; Pierre De Loor

arXiv:2505.16392·cs.CL·May 23, 2025

Resource for Error Analysis in Text Simplification: New Taxonomy and Test Collection

Benjamin Vendeville, Liana Ermakova, Pierre De Loor

PDF

TL;DR

This paper introduces a new error taxonomy and a test collection for evaluating errors in automatic text simplification, aiming to improve the assessment and development of more reliable simplification models.

Contribution

It presents a novel error taxonomy, a human-annotated dataset of simplified scientific texts, and an analysis of model performance in error detection and classification.

Findings

01

Existing metrics do not correlate well with error presence

02

The dataset enables better evaluation of simplification errors

03

Models show varying effectiveness in error detection and classification

Abstract

The general public often encounters complex texts but does not have the time or expertise to fully understand them, leading to the spread of misinformation. Automatic Text Simplification (ATS) helps make information more accessible, but its evaluation methods have not kept up with advances in text generation, especially with Large Language Models (LLMs). In particular, recent studies have shown that current ATS metrics do not correlate with the presence of errors. Manual inspections have further revealed a variety of errors, underscoring the need for a more nuanced evaluation framework, which is currently lacking. This resource paper addresses this gap by introducing a test collection for detecting and classifying errors in simplified texts. First, we propose a taxonomy of errors, with a formal focus on information distortion. Next, we introduce a parallel dataset of automatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus