TL;DR
Morse is a sequence-to-sequence model that analyzes morphology by generating lemmas and features as sequences, outperforming previous models across multiple languages and resource settings.
Contribution
The paper introduces Morse, a novel encoder-decoder model that generates morphological analyses as sequences, enabling better handling of rare tags and complex inflectional structures.
Findings
Achieved state-of-the-art results in nine languages.
Handled rare and unseen tags effectively.
Produced high accuracy in Turkish morphology analysis.
Abstract
We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g.\ an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
