Joint Lemmatization and Morphological Tagging with LEMMING

Thomas Muller; Ryan Cotterell; Alexander Fraser; Hinrich Sch\"utze

arXiv:2405.18308·cs.CL·May 29, 2024

Joint Lemmatization and Morphological Tagging with LEMMING

Thomas Muller, Ryan Cotterell, Alexander Fraser, Hinrich Sch\"utze

PDF

Open Access

TL;DR

LEMMING is a modular model that jointly performs lemmatization and morphological tagging, achieving state-of-the-art results without relying on dictionaries, and demonstrating mutual benefits of joint modeling across multiple languages.

Contribution

It introduces LEMMING, a novel joint lemmatization and tagging model that improves accuracy and does not depend on morphological dictionaries.

Findings

01

Sets new state-of-the-art in token-based lemmatization for six languages.

02

Reduces Czech lemmatization error by 60%.

03

Joint modeling benefits both lemmatization and tagging.

Abstract

We present LEMMING, a modular log-linear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. It is trainable on corpora annotated with gold standard tags and lemmata and does not rely on morphological dictionaries or analyzers. LEMMING sets the new state of the art in token-based statistical lemmatization on six languages; e.g., for Czech lemmatization, we reduce the error by 60%, from 4.05 to 1.58. We also give empirical evidence that jointly modeling morphological tags and lemmata is mutually beneficial.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques