Leveraging Transformer-Based Models for Predicting Inflection Classes of Words in an Endangered Sami Language
Khalid Alnajjar, Mika H\"am\"al\"ainen, Jack Rueter

TL;DR
This paper develops a transformer-based model to classify lexical and morphosyntactic features of Skolt Sami, an endangered language, aiding linguistic documentation and language preservation efforts.
Contribution
It introduces an end-to-end pipeline for inflection class prediction in Skolt Sami, addressing data scarcity and linguistic complexity with a novel transformer approach.
Findings
Achieved an average weighted F1 score of 1.00 for POS classification.
Achieved an average weighted F1 score of 0.81 for inflection class classification.
Provided publicly available trained models and code for endangered language NLP.
Abstract
This paper presents a methodology for training a transformer-based model to classify lexical and morphosyntactic features of Skolt Sami, an endangered Uralic language characterized by complex morphology. The goal of our approach is to create an effective system for understanding and analyzing Skolt Sami, given the limited data availability and linguistic intricacies inherent to the language. Our end-to-end pipeline includes data extraction, augmentation, and training a transformer-based model capable of predicting inflection classes. The motivation behind this work is to support language preservation and revitalization efforts for minority languages like Skolt Sami. Accurate classification not only helps improve the state of Finite-State Transducers (FSTs) by providing greater lexical coverage but also contributes to systematic linguistic documentation for researchers working with newly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
