Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages
Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Sch\"utze

TL;DR
This paper enhances neural morphological segmentation for polysynthetic languages with novel multi-task training, data augmentation, and cross-lingual transfer, achieving improved performance in minimal-resource settings.
Contribution
It introduces new multi-task training approaches, data augmentation methods, and demonstrates effective cross-lingual transfer for low-resource polysynthetic languages.
Findings
Neural seq2seq models perform well on Mexican polysynthetic languages with limited data.
Proposed methods improve segmentation accuracy across all tested languages.
A single multilingual model can replace multiple monolingual models, reducing parameters by 75%.
Abstract
Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
