Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan, Huang, Furu Wei

TL;DR
This paper proposes a novel self-labeling approach for cross-lingual models that enhances token-level transfer tasks and provides effective word alignment capabilities, outperforming previous methods.
Contribution
Introduces denoising word alignment as a pre-training task, improving cross-lingual transfer and alignment accuracy through an EM-style training process.
Findings
Improves cross-lingual transfer on token-level tasks.
Achieves low error rates on alignment benchmarks.
Enhances pretrained models with self-labeled word alignment.
Abstract
The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask tokens in a bitext pair. Given a masked token, the model uses a pointer network to predict the aligned token in the other language. We alternately perform the above two steps in an expectation-maximization manner. Experimental results show that our method improves cross-lingual transferability on various datasets, especially on the token-level tasks, such as question answering, and structured prediction. Moreover, the model can serve as a pretrained word aligner, which achieves reasonably low error rates on the alignment benchmarks. The code and pretrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSoftmax · Sigmoid Activation · Long Short-Term Memory · Tanh Activation · [LivE@PeRson]How do I talk to a real person at Expedia? · Pointer Network
