Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word   Alignment

Zewen Chi; Li Dong; Bo Zheng; Shaohan Huang; Xian-Ling Mao; Heyan; Huang; Furu Wei

arXiv:2106.06381·cs.CL·September 14, 2021·1 cites

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan, Huang, Furu Wei

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper proposes a novel self-labeling approach for cross-lingual models that enhances token-level transfer tasks and provides effective word alignment capabilities, outperforming previous methods.

Contribution

Introduces denoising word alignment as a pre-training task, improving cross-lingual transfer and alignment accuracy through an EM-style training process.

Findings

01

Improves cross-lingual transfer on token-level tasks.

02

Achieves low error rates on alignment benchmarks.

03

Enhances pretrained models with self-labeled word alignment.

Abstract

The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask tokens in a bitext pair. Given a masked token, the model uses a pointer network to predict the aligned token in the other language. We alternately perform the above two steps in an expectation-maximization manner. Experimental results show that our method improves cross-lingual transferability on various datasets, especially on the token-level tasks, such as question answering, and structured prediction. Moreover, the model can serve as a pretrained word aligner, which achieves reasonably low error rates on the alignment benchmarks. The code and pretrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CZWin32768/XLM-Align
pytorchOfficial

Models

🤗
CZWin32768/xlm-align
model· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSoftmax · Sigmoid Activation · Long Short-Term Memory · Tanh Activation · [LivE@PeRson]How do I talk to a real person at Expedia? · Pointer Network