Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using   Multilingual BERT

Beiduo Chen; Wu Guo; Quan Liu; Kun Tao

arXiv:2205.08497·cs.CL·May 18, 2022

Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT

Beiduo Chen, Wu Guo, Quan Liu, Kun Tao

PDF

Open Access

TL;DR

This paper introduces a feature aggregation method that combines information from multiple layers of multilingual BERT to improve zero-shot cross-lingual transfer tasks, demonstrating performance gains on several benchmarks.

Contribution

It proposes an attention-based feature aggregation module that leverages lower layers of mBERT, enhancing cross-lingual task performance beyond the last layer's output.

Findings

01

Performance improvements on XNLI, PAWS-X, NER, and POS tasks.

02

Lower layers of mBERT contain useful information for cross-lingual transfer.

03

Enhanced interpretability of mBERT layers through analysis.

Abstract

Multilingual BERT (mBERT), a language model pre-trained on large multilingual corpora, has impressive zero-shot cross-lingual transfer capabilities and performs surprisingly well on zero-shot POS tagging and Named Entity Recognition (NER), as well as on cross-lingual model transfer. At present, the mainstream methods to solve the cross-lingual downstream tasks are always using the last transformer layer's output of mBERT as the representation of linguistic information. In this work, we explore the complementary property of lower layers to the last transformer layer of mBERT. A feature aggregation module based on an attention mechanism is proposed to fuse the information contained in different layers of mBERT. The experiments are conducted on four zero-shot cross-lingual transfer datasets, and the proposed method obtains performance improvements on key multilingual benchmark tasks XNLI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Attention Dropout · Layer Normalization · Dropout · Dense Connections · Adam · Refunds@Expedia|||How do I get a full refund from Expedia?