Neural Morphological Tagging for Nguni Languages
Cael Marquard, Simbarashe Mawere, Francois Meyer

TL;DR
This paper explores neural methods for morphological tagging in Nguni languages, demonstrating that neural taggers outperform rule-based systems and that training from scratch yields better results than fine-tuning pretrained models.
Contribution
It introduces neural morphological taggers for Nguni languages, compares different neural approaches, and evaluates their effectiveness against traditional rule-based parsers.
Findings
Neural taggers outperform rule-based baseline.
Training from scratch outperforms fine-tuning pretrained models.
Neural taggers are viable for Nguni morphological analysis.
Abstract
Morphological parsing is the task of decomposing words into morphemes, the smallest units of meaning in a language, and labelling their grammatical roles. It is a particularly challenging task for agglutinative languages, such as the Nguni languages of South Africa, which construct words by concatenating multiple morphemes. A morphological parsing system can be framed as a pipeline with two separate components, a segmenter followed by a tagger. This paper investigates the use of neural methods to build morphological taggers for the four Nguni languages. We compare two classes of approaches: training neural sequence labellers (LSTMs and neural CRFs) from scratch and finetuning pretrained language models. We compare performance across these two categories, as well as to a traditional rule-based morphological parser. Neural taggers comfortably outperform the rule-based baseline and models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Syntax, Semantics, Linguistic Variation
