Many Languages, One Parser

Waleed Ammar; George Mulcaire; Miguel Ballesteros; Chris Dyer; Noah A.; Smith

arXiv:1602.01595·cs.CL·July 27, 2016

Many Languages, One Parser

Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A., Smith

PDF

1 Repo

TL;DR

This paper presents a unified multilingual dependency parser that leverages language-specific features and universal linguistic properties, enabling effective parsing across multiple languages with limited annotated data.

Contribution

The authors introduce a single multilingual parser that uses language-specific features and universal representations, improving cross-lingual parsing performance especially with limited data.

Findings

01

Performs well across languages with varying data sizes

02

Generalizes effectively based on linguistic universals

03

Outperforms strong baselines in multiple scenarios

Abstract

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser's performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clab/language-universal-parser
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.