From English To Foreign Languages: Transferring Pre-trained Language   Models

Ke Tran

arXiv:2002.07306·cs.CL·April 30, 2020·23 cites

From English To Foreign Languages: Transferring Pre-trained Language Models

Ke Tran

PDF

Open Access 1 Repo

TL;DR

This paper presents a method for efficiently transferring pre-trained English language models to other languages using limited computational resources, achieving competitive results on zero-shot NLP tasks.

Contribution

It introduces a fast transfer approach that adapts existing models to new languages within a day or two on a single GPU, outperforming multilingual BERT on certain tasks.

Findings

01

Transferred models outperform multilingual BERT on zero-shot NLP tasks

02

Efficient transfer process requires only one or two days on a single GPU

03

Models achieve competitive performance across six languages

Abstract

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures for other languages from scratch, it is undesirable due to the required amount of compute. In this work, we tackle the problem of transferring an existing pre-trained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERT base model within a day and a foreign BERT large within two days. Furthermore, evaluating our models on six languages, we demonstrate that our models are better than multilingual BERT on two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexa/ramen
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax