A User-Centered Evaluation of Spanish Text Simplification

Adrian de Wynter; Anthony Hevia; Si-Qing Chen

arXiv:2308.07556·cs.CL·August 16, 2023

A User-Centered Evaluation of Spanish Text Simplification

Adrian de Wynter, Anthony Hevia, Si-Qing Chen

PDF

Open Access 1 Repo

TL;DR

This paper evaluates Spanish text simplification using corpora and compares readability scores with neural networks, finding neural models better at predicting user preferences but still focusing on spurious features.

Contribution

It introduces a comprehensive evaluation of Spanish text simplification, comparing traditional readability metrics with neural models, and releases corpora for future research.

Findings

01

Neural networks outperform traditional readability scores in predicting user preferences.

02

Multilingual models underperform compared to Spanish-only models on the same task.

03

Models often focus on spurious features like sentence length rather than meaningful simplification.

Abstract

We present an evaluation of text simplification (TS) in Spanish for a production system, by means of two corpora focused in both complex-sentence and complex-word identification. We compare the most prevalent Spanish-specific readability scores with neural networks, and show that the latter are consistently better at predicting user preferences regarding TS. As part of our analysis, we find that multilingual models underperform against equivalent Spanish-only models on the same task, yet all models focus too often on spurious statistical features, such as sentence length. We release the corpora in our evaluation to the broader community with the hopes of pushing forward the state-of-the-art in Spanish natural language processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/breve-claro
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques

MethodsSpatio-temporal stability analysis · Focus