Investigating Text Simplification Evaluation
Laura V\'asquez-Rodr\'iguez, Matthew Shardlow, Piotr Przyby{\l}a,, Sophia Ananiadou

TL;DR
This paper critically examines current text simplification evaluation methods and datasets, revealing inconsistencies and proposing improvements for more robust and accurate TS models and assessments.
Contribution
The study analyzes existing TS corpora and evaluation metrics, and demonstrates that dataset distribution improvements lead to more reliable TS model performance.
Findings
Current TS datasets have significant distribution differences.
Evaluation metrics like BLEU and SARI do not align well with human judgments.
Improving dataset distribution enhances TS model robustness.
Abstract
Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models. However, existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments. Additionally, evaluation is usually performed by using metrics such as BLEU or SARI to compare system output to the gold standard. A major limitation is that these metrics do not match human judgements and the performance on different datasets and linguistic phenomena vary greatly. Furthermore, our research shows that the test and training subsets of parallel datasets differ significantly. In this work, we investigate existing TS corpora, providing new insights that will motivate the improvement of existing state-of-the-art TS evaluation methods. Our contributions include the analysis of TS corpora based on existing modifications used for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
MethodsSpatio-temporal stability analysis
