FST Morphology for the Endangered Skolt Sami Language
Jack Rueter, Mika H\"am\"al\"ainen

TL;DR
This paper introduces a finite-state transducer-based morphological analyzer and generator for Skolt Sami, an endangered Uralic language, aiding its linguistic revitalization by providing detailed morphological analysis despite limited resources.
Contribution
It presents the first comprehensive FST-based morphological tool for Skolt Sami, covering extensive inflectional and derivational forms to support language revitalization efforts.
Findings
Analyzed over 30,000 words with 148 inflectional paradigms
Covered over 12 derivational forms
Supports language revitalization initiatives
Abstract
We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami. Like other minority Uralic languages, Skolt Sami exhibits a rich morphology, on the one hand, and there is little golden standard material for it, on the other. This makes NLP approaches for its study difficult without a solid morphological analysis. The language is severely endangered and the work presented in this paper forms a part of a greater whole in its revitalization efforts. Furthermore, we intersperse our description with facilitation and description practices not well documented in the infrastructure. Currently, the analyzer covers over 30,000 Skolt Sami words in 148 inflectional paradigms and over 12 derivational forms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
