Lexical Simplification Benchmarks for English, Portuguese, and Spanish
Sanja Stajner, Daniel Ferres, Matthew Shardlow, Kai North, Marcos, Zampieri, Horacio Saggion

TL;DR
This paper introduces a new benchmark dataset for lexical simplification in English, Spanish, and Portuguese, enabling cross-lingual system comparison and revealing neural systems' superior performance, especially in English.
Contribution
It provides the first multilingual dataset for lexical simplification and adapts two advanced systems to evaluate their effectiveness across three languages.
Findings
Neural systems outperform non-neural systems in all three languages.
Neural systems perform significantly better in English than in Spanish and Portuguese.
The dataset enables direct comparison of lexical simplification systems across languages.
Abstract
Even in highly-developed countries, as many as 15-30\% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task that aims to make text understandable to everyone by replacing complex vocabulary and expressions with simpler ones, while preserving the original meaning. It has attracted considerable attention in the last 20 years, and fully automatic lexical simplification systems have been proposed for various languages. The main obstacle for the progress of the field is the absence of high-quality datasets for building and evaluating lexical simplification systems. We present a new benchmark dataset for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
