Cross-Lingual Dependency Parsing Using Code-Mixed TreeBank

Zhang Meishan; Zhang Yue; Fu Guohong

arXiv:1909.02235·cs.CL·September 6, 2019

Cross-Lingual Dependency Parsing Using Code-Mixed TreeBank

Zhang Meishan, Zhang Yue, Fu Guohong

PDF

Open Access

TL;DR

This paper introduces a novel cross-lingual dependency parsing approach using code-mixed treebanks, which improves transfer accuracy by selectively translating confident words and leveraging cross-lingual embeddings.

Contribution

It proposes a new method of syntactic transfer via code-mixing, addressing alignment issues in traditional treebank translation for cross-lingual parsing.

Findings

01

Code-mixed treebanks outperform translated treebanks in parsing accuracy.

02

The method achieves competitive results among cross-lingual parsing techniques.

03

Leveraging cross-lingual embeddings enhances syntactic transfer effectiveness.

Abstract

Treebank translation is a promising method for cross-lingual transfer of syntactic dependency knowledge. The basic idea is to map dependency arcs from a source treebank to its target translation according to word alignments. This method, however, can suffer from imperfect alignment between source and target words. To address this problem, we investigate syntactic transfer by code mixing, translating only confident words in a source treebank. Cross-lingual word embeddings are leveraged for transferring syntactic knowledge to the target from the resulting code-mixed treebank. Experiments on University Dependency Treebanks show that code-mixed treebanks are more effective than translated treebanks, giving highly competitive performances among cross-lingual parsing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification