Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data
Anouck Braggaar, Rob van der Goot

TL;DR
This paper discusses the challenges and solutions in creating a Universal Dependencies Treebank for spoken Frisian-Dutch code-switching data, highlighting annotation difficulties and improvements through iterative annotation.
Contribution
It introduces a methodology for annotating low-resource, code-switched spoken language data into Universal Dependencies, with iterative annotation and resolution of disagreements.
Findings
Annotation accuracy improved by 7.8 UAS and 10.5 LAS points after iterative annotation.
Identifies key challenges in annotating code-switched spoken language.
Proposes solutions for annotation difficulties in low-resource, informal speech data.
Abstract
This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian code-switch utterances into Universal Dependencies. We make use of data from the FAME! corpus, which consists of transcriptions and audio data. Besides the usual annotation difficulties, this dataset is extra challenging because of Frisian being low-resource, the informal nature of the data, code-switching and non-standard sentence segmentation. As a starting point, two annotators annotated 150 random utterances in three stages of 50 utterances. After each stage, disagreements where discussed and resolved. An increase of 7.8 UAS and 10.5 LAS points was achieved between the first and third round. This paper will focus on the issues that arise when annotating a transcribed speech corpus. To resolve these issues several solutions are proposed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
