LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich   Languages with BRNNs

Daniel Kondratyuk; Tom\'a\v{s} Gaven\v{c}iak; Milan Straka; Jan; Haji\v{c}

arXiv:1808.03703·cs.CL·August 28, 2018

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

Daniel Kondratyuk, Tom\'a\v{s} Gaven\v{c}iak, Milan Straka, Jan, Haji\v{c}

PDF

2 Repos

TL;DR

LemmaTag is a neural network model that jointly performs part-of-speech tagging and lemmatization for morphologically-rich languages, improving accuracy by sharing representations and leveraging task interdependencies.

Contribution

The paper introduces LemmaTag, a novel joint tagging and lemmatizing neural architecture that outperforms existing methods on complex languages.

Findings

01

Achieves state-of-the-art accuracy in POS tagging and lemmatization

02

Effective across languages with complex morphology

03

Shared encoding benefits both tasks

Abstract

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings. We demonstrate that both tasks benefit from sharing the encoding part of the network, predicting tag subcategories, and using the tagger output as an input to the lemmatizer. We evaluate our model across several languages with complex morphology, which surpasses state-of-the-art accuracy in both part-of-speech tagging and lemmatization in Czech, German, and Arabic.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.