Apurin\~a Universal Dependencies Treebank
Jack Rueter, Mar\'ilia Fernanda Pereira de Freitas, Sidney da Silva, Facundes, Mika H\"am\"al\"ainen, Niko Partanen

TL;DR
This paper introduces the first Universal Dependencies treebank for the Apurinã language, providing foundational linguistic resources and infrastructure for an endangered Amazonian language.
Contribution
It presents a novel annotated treebank for Apurinã, including unique features and a finite-state description, supporting language preservation and computational analysis.
Findings
First annotated treebank for Apurinã language
Includes 14 parts-of-speech and 7 new features
Facilitates language preservation and computational tools
Abstract
This paper presents and discusses the first Universal Dependencies treebank for the Apurin\~a language. The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features - some of which are unique to Apurin\~a. The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon. The source materials used in the initial treebank represent fieldwork practices where not all tokens of all sentences are equally annotated. For this reason, establishing regular annotation practices for the entire Apurin\~a treebank is an ongoing project.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
