AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Yaqing Wang; Sahaj Agarwal; Subhabrata Mukherjee; Xiaodong Liu; Jing; Gao; Ahmed Hassan Awadallah; Jianfeng Gao

arXiv:2210.17451·cs.CL·November 3, 2022·1 cites

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing, Gao, Ahmed Hassan Awadallah, Jianfeng Gao

PDF

Open Access 1 Repo

TL;DR

AdaMix introduces a mixture-of-adaptations approach for parameter-efficient fine-tuning of large language models, significantly improving performance while tuning only a tiny fraction of parameters.

Contribution

It proposes AdaMix, a novel PEFT method that combines multiple adaptation modules in each Transformer layer, outperforming existing methods with minimal parameter updates.

Findings

01

AdaMix outperforms SOTA PEFT and full fine-tuning on NLU and NLG tasks.

02

Tuning only 0.1-0.2% of parameters yields superior results.

03

AdaMix matches computational cost of underlying PEFT methods.

Abstract

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models. To address this, parameter-efficient fine-tuning (PEFT) techniques were introduced where small trainable components are injected in the PLM and updated during fine-tuning. We propose AdaMix as a general PEFT method that tunes a mixture of adaptation modules -- given the underlying PEFT method of choice -- introduced in each Transformer layer while keeping most of the PLM weights frozen. For instance, AdaMix can leverage a mixture of adapters like Houlsby or a mixture of low rank decomposition matrices like LoRA to improve downstream task performance over the corresponding PEFT methods for fully supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/AdaMix
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization