Cross-Lingual Syntactic Transfer with Limited Resources
Mohammad Sadegh Rasooli, Michael Collins

TL;DR
This paper presents a straightforward approach for cross-lingual dependency parsing that effectively leverages limited translation data, improving performance across multiple languages by combining word clustering, lexical transfer, and annotation projection techniques.
Contribution
It introduces a novel three-step method for cross-lingual syntactic transfer that works well with small translation datasets, outperforming previous state-of-the-art methods.
Findings
Significant improvements over previous methods with Bible translation data
Additional gains using Europarl corpus as translation data
Effective across 38 Universal Dependencies datasets
Abstract
We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. The method makes use of three steps: 1) a method for deriving cross-lingual word clusters, which can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work, in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
