TL;DR
LemmaTag is a neural network model that jointly performs part-of-speech tagging and lemmatization for morphologically-rich languages, improving accuracy by sharing representations and leveraging task interdependencies.
Contribution
The paper introduces LemmaTag, a novel joint tagging and lemmatizing neural architecture that outperforms existing methods on complex languages.
Findings
Achieves state-of-the-art accuracy in POS tagging and lemmatization
Effective across languages with complex morphology
Shared encoding benefits both tasks
Abstract
We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings. We demonstrate that both tasks benefit from sharing the encoding part of the network, predicting tag subcategories, and using the tagger output as an input to the lemmatizer. We evaluate our model across several languages with complex morphology, which surpasses state-of-the-art accuracy in both part-of-speech tagging and lemmatization in Czech, German, and Arabic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
