On the Role of Morphological Information for Contextual Lemmatization
Olia Toporkov, Rodrigo Agerri

TL;DR
This study empirically investigates the impact of explicit morphological information on contextual lemmatization across six languages, finding that modern models often perform well without such features and that current evaluation methods may be insufficient.
Contribution
It challenges the assumption that detailed morphological features improve lemmatization, showing that modern contextual embeddings encode enough information and highlighting issues with current evaluation practices.
Findings
Morphological features have limited impact on lemmatization performance.
Simple UPOS tags can be as effective as detailed morphological features.
Current evaluation practices may not adequately differentiate model performance.
Abstract
Lemmatization is a natural language processing (NLP) task which consists of producing, from a given inflected word, its canonical form or lemma. Lemmatization is one of the basic tasks that facilitate downstream NLP applications, and is of particular importance for high-inflected languages. Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance. In order to address this issue, in this paper we empirically investigate the role of morphological information to develop contextual lemmatizers in six languages within a varied spectrum of morphological complexity: Basque, Turkish, Russian, Czech, Spanish and English. Furthermore, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
