Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

TL;DR
This paper introduces mini-model adaptation, a compute-efficient method for extending pretrained language models to new languages by training small, shallow models that are aligned with the large model, enabling rapid cross-lingual transfer.
Contribution
It proposes two novel approaches, MiniJoint and MiniPost, for building and training mini-models that significantly reduce computational costs while maintaining performance.
Findings
Mini-model adaptation matches standard methods' performance.
Achieves 2.3x less compute on average.
Effective for cross-lingual transfer on multiple datasets.
Abstract
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
