Learning Multilingual Representation for Natural Language Understanding with Enhanced Cross-Lingual Supervision
Yinpeng Guo, Liangyou Li, Xin Jiang, Qun Liu

TL;DR
This paper introduces a decomposed attention architecture with intra-lingual and cross-lingual modules, along with a language-adaptive re-weighting strategy, to enhance multilingual representations for natural language understanding tasks.
Contribution
It proposes a novel decomposed attention mechanism and training strategy that improve cross-lingual transferability over traditional mixed attention models.
Findings
Significant improvement in cross-lingual transfer performance
Enhanced model ability to handle multiple languages
Better alignment of intra- and cross-lingual representations
Abstract
Recently, pre-training multilingual language models has shown great potential in learning multilingual representation, a crucial topic of natural language processing. Prior works generally use a single mixed attention (MA) module, following TLM (Conneau and Lample, 2019), for attending to intra-lingual and cross-lingual contexts equivalently and simultaneously. In this paper, we propose a network named decomposed attention (DA) as a replacement of MA. The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively. In addition, we introduce a language-adaptive re-weighting strategy during training to further boost the model's performance. Experiments on various cross-lingual natural language understanding (NLU) tasks show that the proposed architecture and learning strategy significantly improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
