One model, two languages: training bilingual parsers with harmonized treebanks
David Vilares, Carlos G\'omez-Rodr\'iguez, Miguel A. Alonso

TL;DR
This paper presents a method for training bilingual parsers using merged harmonized treebanks, enabling analysis of monolingual and code-switched sentences with improved or comparable accuracy to monolingual parsers.
Contribution
The paper introduces a novel approach to train bilingual parsers from merged treebanks, demonstrating effectiveness across multiple language pairs and code-switching scenarios.
Findings
Bilingual parsers perform comparably or better than monolingual ones.
The approach is effective on Universal Dependency Treebanks.
Preliminary results show promise for multi-language and code-switching texts.
Abstract
We introduce an approach to train lexicalized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOptimizer. The results show that these bilingual parsers are more than competitive, as most combinations not only preserve accuracy, but some even achieve significant improvements over the corresponding monolingual parsers. Preliminary experiments also show the approach to be promising on texts with code-switching and when more languages are added.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
