A Spanish Tagset for the CRATER Project
Fernando S\'anchez Le\'on (Laboratorio de Ling\"u\'istica, Inform\'atica, Facultad de Filosof\'ia y Letras, Universidad Aut\'onoma de, Madrid)

TL;DR
This paper introduces a specialized Spanish tagset designed for the CRATER project, facilitating multilingual corpus creation and adaptation of existing tools for Spanish language processing.
Contribution
It presents a new Spanish tagset tailored for the CRATER project and details its integration with the Xerox PARC tagger for corpus annotation.
Findings
The tagset is optimized for Spanish linguistic features.
Successful adaptation of the Xerox PARC tagger to Spanish.
Facilitates multilingual corpus alignment and analysis.
Abstract
This working paper describes the Spanish tagset to be used in the context of CRATER, a CEC funded project aiming at the creation of a multilingual (English, French, Spanish) aligned corpus using the International Telecommunications Union corpus. In this respect, each version of the corpus will be (or is currently) tagged. Xerox PARC tagger will be adapted to Spanish in order to perform the tagging of the Spanish version. This tagset has been devised as the ideal one for Spanish, and has been posted to several lists in order to get feedback to it.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
