Morphology Without Borders: Clause-Level Morphology
Omer Goldman, Reut Tsarfaty

TL;DR
This paper proposes a shift from word-level to clause-level morphology, introducing a new dataset and tasks that reveal greater complexity and better interface with language models, advancing cross-linguistic morphological analysis.
Contribution
It introduces MightyMorph, a clause-level morphological dataset for four languages, and defines new tasks that improve understanding of morphology in context and across languages.
Findings
Clause-level tasks are more challenging than word-level tasks.
The dataset covers four typologically diverse languages.
Clause-level morphology aligns better with contextual language models.
Abstract
Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and evaluation data for various tasks. However, a closer inspection of these data reveals profound cross-linguistic inconsistencies, that arise from the lack of a clear linguistic and operational definition of what is a word, and that severely impair the universality of the derived tasks. To overcome this deficiency, we propose to view morphology as a clause-level phenomenon, rather than word-level. It is anchored in a fixed yet inclusive set of features, that encapsulates all functions realized in a saturated clause. We deliver MightyMorph, a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew. We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
