A Spanish Tagset for the CRATER Project

Fernando S\'anchez Le\'on (Laboratorio de Ling\"u\'istica; Inform\'atica; Facultad de Filosof\'ia y Letras; Universidad Aut\'onoma de; Madrid)

arXiv:cmp-lg/9406023·cmp-lg·August 14, 2016·6 cites

A Spanish Tagset for the CRATER Project

Fernando S\'anchez Le\'on (Laboratorio de Ling\"u\'istica, Inform\'atica, Facultad de Filosof\'ia y Letras, Universidad Aut\'onoma de, Madrid)

PDF

Open Access

TL;DR

This paper introduces a specialized Spanish tagset designed for the CRATER project, facilitating multilingual corpus creation and adaptation of existing tools for Spanish language processing.

Contribution

It presents a new Spanish tagset tailored for the CRATER project and details its integration with the Xerox PARC tagger for corpus annotation.

Findings

01

The tagset is optimized for Spanish linguistic features.

02

Successful adaptation of the Xerox PARC tagger to Spanish.

03

Facilitates multilingual corpus alignment and analysis.

Abstract

This working paper describes the Spanish tagset to be used in the context of CRATER, a CEC funded project aiming at the creation of a multilingual (English, French, Spanish) aligned corpus using the International Telecommunications Union corpus. In this respect, each version of the corpus will be (or is currently) tagged. Xerox PARC tagger will be adapted to Spanish in order to perform the tagging of the Spanish version. This tagset has been devised as the ideal one for Spanish, and has been posted to several lists in order to get feedback to it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems