Morphological Disambiguation of South S\'ami with FSTs and Neural Networks
Mika H\"am\"al\"ainen, Linda Wiechetek

TL;DR
This paper introduces a resource-efficient method for morphological disambiguation of South Sámi, an endangered language, using FST-based analysis and neural networks trained on related language data, enabling application with minimal resources.
Contribution
It presents a novel approach combining FSTs and neural networks for endangered language disambiguation without requiring extensive language-specific resources.
Findings
Effective disambiguation using minimal South Sámi data
Utilizes related North Sámi data for training
Applicable to other endangered languages
Abstract
We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North S\'ami UD Treebank and some synthetically generated South S\'ami data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North S\'ami training data for South S\'ami without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South S\'ami, which makes it usable and applicable in the contexts of any other endangered language as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
