LengClaro2023: A Dataset of Administrative Texts in Spanish with Plain Language adaptations
Bel\'en Ag\"uera-Marco, Itziar Gonzalez-Dios

TL;DR
LengClaro2023 introduces a Spanish legal-administrative text dataset with simplified versions to evaluate automatic text simplification systems, aiding accessibility and comprehension.
Contribution
The paper provides a new dataset of Spanish administrative texts with multiple simplified versions, facilitating research in automatic text simplification for legal language.
Findings
Dataset includes original and simplified texts for Spanish legal documents
Two types of simplifications are provided: arText claro and plain language guidelines
Resource supports evaluation of ATS systems in Spanish
Abstract
In this work, we present LengClaro2023, a dataset of legal-administrative texts in Spanish. Based on the most frequently used procedures from the Spanish Social Security website, we have created for each text two simplified equivalents. The first version follows the recommendations provided by arText claro. The second version incorporates additional recommendations from plain language guidelines to explore further potential improvements in the system. The linguistic resource created in this work can be used for evaluating automatic text simplification (ATS) systems in Spanish.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Authorship Attribution and Profiling · Natural Language Processing Techniques
