External Lexical Information for Multilingual Part-of-Speech Tagging

Beno\^it Sagot (ALPAGE)

arXiv:1606.03676·cs.CL·August 10, 2016·1 cites

External Lexical Information for Multilingual Part-of-Speech Tagging

Beno\^it Sagot (ALPAGE)

PDF

Open Access

TL;DR

This paper compares feature-based and neural-based models for multilingual POS tagging, showing that feature-based models with lexical info excel in morphologically rich languages, while neural models perform better with less lexical variability.

Contribution

It demonstrates that feature-based models enriched with morphosyntactic lexicons can be competitive with neural methods in multilingual POS tagging.

Findings

01

Feature-based models perform better on morphologically rich languages.

02

Neural models excel on datasets with less lexical variability.

03

All four systems reach state-of-the-art results across 16 languages.

Abstract

Morphosyntactic lexicons and word vector representations have both proven useful for improving the accuracy of statistical part-of-speech taggers. Here we compare the performances of four systems on datasets covering 16 languages, two of these systems being feature-based (MEMMs and CRFs) and two of them being neural-based (bi-LSTMs). We show that, on average, all four approaches perform similarly and reach state-of-the-art results. Yet better performances are obtained with our feature-based models on lexically richer datasets (e.g. for morphologically rich languages), whereas neural-based results are higher on datasets with less lexical variability (e.g. for English). These conclusions hold in particular for the MEMM models relying on our system MElt, which benefited from newly designed features. This shows that, under certain conditions, feature-based approaches enriched with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems