Morphological Disambiguation of South S\'ami with FSTs and Neural   Networks

Mika H\"am\"al\"ainen; Linda Wiechetek

arXiv:2004.14062·cs.CL·April 30, 2020·1 cites

Morphological Disambiguation of South S\'ami with FSTs and Neural Networks

Mika H\"am\"al\"ainen, Linda Wiechetek

PDF

Open Access

TL;DR

This paper introduces a resource-efficient method for morphological disambiguation of South Sámi, an endangered language, using FST-based analysis and neural networks trained on related language data, enabling application with minimal resources.

Contribution

It presents a novel approach combining FSTs and neural networks for endangered language disambiguation without requiring extensive language-specific resources.

Findings

01

Effective disambiguation using minimal South Sámi data

02

Utilizes related North Sámi data for training

03

Applicable to other endangered languages

Abstract

We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North S\'ami UD Treebank and some synthetically generated South S\'ami data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North S\'ami training data for South S\'ami without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South S\'ami, which makes it usable and applicable in the contexts of any other endangered language as well.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis