3CEL: A corpus of legal Spanish contract clauses
Nuria Aldama Garc\'ia, Patricia Mars\`a Morales, David Betancur, S\'anchez, \'Alvaro Barbero Jim\'enez, Marta Guerrero Nieto, Pablo Haya Coll,, Patricia Mart\'in Chozas, Elena Montiel Ponsoda

TL;DR
This paper introduces 3CEL, a new annotated corpus of Spanish legal contract clauses designed to facilitate NLP tasks in legal domain understanding and review, addressing resource scarcity in Spanish legal NLP.
Contribution
The paper presents the creation and annotation of 3CEL, a comprehensive legal Spanish contract clause corpus with detailed tagging for contract information extraction.
Findings
Contains 373 manually annotated tenders
Includes 19 categories with 4,782 total tags
Aims to improve legal NLP in Spanish
Abstract
Legal corpora for Natural Language Processing (NLP) are valuable and scarce resources in languages like Spanish due to two main reasons: data accessibility and legal expert knowledge availability. INESData 2024 is a European Union funded project lead by the Universidad Polit\'ecnica de Madrid (UPM) and developed by Instituto de Ingenier\'ia del Conocimiento (IIC) to create a series of state-of-the-art NLP resources applied to the legal/administrative domain in Spanish. The goal of this paper is to present the Corpus of Legal Spanish Contract Clauses (3CEL), which is a contract information extraction corpus developed within the framework of INESData 2024. 3CEL contains 373 manually annotated tenders using 19 defined categories (4 782 total tags) that identify key information for contract understanding and reviewing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, logistics, and international trade · Legal Language and Interpretation · Artificial Intelligence in Law
