Joint Lemmatization and Morphological Tagging with LEMMING
Thomas Muller, Ryan Cotterell, Alexander Fraser, Hinrich Sch\"utze

TL;DR
LEMMING is a modular model that jointly performs lemmatization and morphological tagging, achieving state-of-the-art results without relying on dictionaries, and demonstrating mutual benefits of joint modeling across multiple languages.
Contribution
It introduces LEMMING, a novel joint lemmatization and tagging model that improves accuracy and does not depend on morphological dictionaries.
Findings
Sets new state-of-the-art in token-based lemmatization for six languages.
Reduces Czech lemmatization error by 60%.
Joint modeling benefits both lemmatization and tagging.
Abstract
We present LEMMING, a modular log-linear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. It is trainable on corpora annotated with gold standard tags and lemmata and does not rely on morphological dictionaries or analyzers. LEMMING sets the new state of the art in token-based statistical lemmatization on six languages; e.g., for Czech lemmatization, we reduce the error by 60%, from 4.05 to 1.58. We also give empirical evidence that jointly modeling morphological tags and lemmata is mutually beneficial.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
