Language Resources in Spanish for Automatic Text Simplification across   Domains

Antonio Moreno-Sandoval; Leonardo Campillos-Llanos; Ana; Garc\'ia-Serrano

arXiv:2409.20466·cs.CL·October 1, 2024

Language Resources in Spanish for Automatic Text Simplification across Domains

Antonio Moreno-Sandoval, Leonardo Campillos-Llanos, Ana, Garc\'ia-Serrano

PDF

Open Access

TL;DR

This paper presents new Spanish language resources, corpora, and tools for automatic text simplification across finance, medicine, and history, facilitating domain-specific simplification tasks.

Contribution

It introduces domain-specific corpora, annotation guidelines, a medical lexicon, datasets for shared tasks, and two simplification tools, all publicly available.

Findings

01

Created multiple domain-specific corpora and guidelines

02

Developed a medical lexicon for simplification

03

Built two automatic simplification tools

Abstract

This work describes the language resources and models developed for automatic simplification of Spanish texts in three domains: Finance, Medicine and History studies. We created several corpora in each domain, annotation and simplification guidelines, a lexicon of technical and simplified medical terms, datasets used in shared tasks for the financial domain, and two simplification tools. The methodology, resources and companion publications are shared publicly on the web-site: https://clara-nlp.uned.es/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques