A User-Centered Evaluation of Spanish Text Simplification
Adrian de Wynter, Anthony Hevia, Si-Qing Chen

TL;DR
This paper evaluates Spanish text simplification using corpora and compares readability scores with neural networks, finding neural models better at predicting user preferences but still focusing on spurious features.
Contribution
It introduces a comprehensive evaluation of Spanish text simplification, comparing traditional readability metrics with neural models, and releases corpora for future research.
Findings
Neural networks outperform traditional readability scores in predicting user preferences.
Multilingual models underperform compared to Spanish-only models on the same task.
Models often focus on spurious features like sentence length rather than meaningful simplification.
Abstract
We present an evaluation of text simplification (TS) in Spanish for a production system, by means of two corpora focused in both complex-sentence and complex-word identification. We compare the most prevalent Spanish-specific readability scores with neural networks, and show that the latter are consistently better at predicting user preferences regarding TS. As part of our analysis, we find that multilingual models underperform against equivalent Spanish-only models on the same task, yet all models focus too often on spurious statistical features, such as sentence length. We release the corpora in our evaluation to the broader community with the hopes of pushing forward the state-of-the-art in Spanish natural language processing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
MethodsSpatio-temporal stability analysis · Focus
