From English To Foreign Languages: Transferring Pre-trained Language Models
Ke Tran

TL;DR
This paper presents a method for efficiently transferring pre-trained English language models to other languages using limited computational resources, achieving competitive results on zero-shot NLP tasks.
Contribution
It introduces a fast transfer approach that adapts existing models to new languages within a day or two on a single GPU, outperforming multilingual BERT on certain tasks.
Findings
Transferred models outperform multilingual BERT on zero-shot NLP tasks
Efficient transfer process requires only one or two days on a single GPU
Models achieve competitive performance across six languages
Abstract
Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures for other languages from scratch, it is undesirable due to the required amount of compute. In this work, we tackle the problem of transferring an existing pre-trained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERT base model within a day and a foreign BERT large within two days. Furthermore, evaluating our models on six languages, we demonstrate that our models are better than multilingual BERT on two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
