LengClaro2023: A Dataset of Administrative Texts in Spanish with Plain Language adaptations

Bel\'en Ag\"uera-Marco; Itziar Gonzalez-Dios

arXiv:2506.05927·cs.CL·June 9, 2025

LengClaro2023: A Dataset of Administrative Texts in Spanish with Plain Language adaptations

Bel\'en Ag\"uera-Marco, Itziar Gonzalez-Dios

PDF

Open Access

TL;DR

LengClaro2023 introduces a Spanish legal-administrative text dataset with simplified versions to evaluate automatic text simplification systems, aiding accessibility and comprehension.

Contribution

The paper provides a new dataset of Spanish administrative texts with multiple simplified versions, facilitating research in automatic text simplification for legal language.

Findings

01

Dataset includes original and simplified texts for Spanish legal documents

02

Two types of simplifications are provided: arText claro and plain language guidelines

03

Resource supports evaluation of ATS systems in Spanish

Abstract

In this work, we present LengClaro2023, a dataset of legal-administrative texts in Spanish. Based on the most frequently used procedures from the Spanish Social Security website, we have created for each text two simplified equivalents. The first version follows the recommendations provided by arText claro. The second version incorporates additional recommendations from plain language guidelines to explore further potential improvements in the system. The linguistic resource created in this work can be used for evaluating automatic text simplification (ATS) systems in Spanish.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Authorship Attribution and Profiling · Natural Language Processing Techniques