Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang

TL;DR
Mixout is a novel regularization method inspired by dropout that improves the stability and accuracy of fine-tuning large pretrained language models, especially with limited training data.
Contribution
The paper introduces mixout, a new regularization technique that adaptively stabilizes fine-tuning of large language models, enhancing performance on downstream tasks.
Findings
Mixout improves fine-tuning stability.
Mixout increases average accuracy on GLUE tasks.
Mixout adapts regularization strength during training.
Abstract
In natural language processing, it has been observed recently that generalization could be greatly improved by finetuning a large-scale language model pretrained on a large unlabeled corpus. Despite its recent success and wide adoption, finetuning a large pretrained language model on a downstream task is prone to degenerate performance when there are only a small number of training instances available. In this paper, we introduce a new regularization technique, to which we refer as "mixout", motivated by dropout. Mixout stochastically mixes the parameters of two models. We show that our mixout technique regularizes learning to minimize the deviation from one of the two models and that the strength of regularization adapts along the optimization trajectory. We empirically evaluate the proposed mixout and its variants on finetuning a pretrained language model on downstream tasks. More…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
