One model, two languages: training bilingual parsers with harmonized   treebanks

David Vilares; Carlos G\'omez-Rodr\'iguez; Miguel A. Alonso

arXiv:1507.08449·cs.CL·May 20, 2016

One model, two languages: training bilingual parsers with harmonized treebanks

David Vilares, Carlos G\'omez-Rodr\'iguez, Miguel A. Alonso

PDF

TL;DR

This paper presents a method for training bilingual parsers using merged harmonized treebanks, enabling analysis of monolingual and code-switched sentences with improved or comparable accuracy to monolingual parsers.

Contribution

The paper introduces a novel approach to train bilingual parsers from merged treebanks, demonstrating effectiveness across multiple language pairs and code-switching scenarios.

Findings

01

Bilingual parsers perform comparably or better than monolingual ones.

02

The approach is effective on Universal Dependency Treebanks.

03

Preliminary results show promise for multi-language and code-switching texts.

Abstract

We introduce an approach to train lexicalized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOptimizer. The results show that these bilingual parsers are more than competitive, as most combinations not only preserve accuracy, but some even achieve significant improvements over the corresponding monolingual parsers. Preliminary experiments also show the approach to be promising on texts with code-switching and when more languages are added.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.