Mini-Model Adaptation: Efficiently Extending Pretrained Models to New   Languages via Aligned Shallow Training

Kelly Marchisio; Patrick Lewis; Yihong Chen; Mikel Artetxe

arXiv:2212.10503·cs.CL·July 6, 2023

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

PDF

Open Access

TL;DR

This paper introduces mini-model adaptation, a compute-efficient method for extending pretrained language models to new languages by training small, shallow models that are aligned with the large model, enabling rapid cross-lingual transfer.

Contribution

It proposes two novel approaches, MiniJoint and MiniPost, for building and training mini-models that significantly reduce computational costs while maintaining performance.

Findings

01

Mini-model adaptation matches standard methods' performance.

02

Achieves 2.3x less compute on average.

03

Effective for cross-lingual transfer on multiple datasets.

Abstract

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques