SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language   Models through Principled Regularized Optimization

Haoming Jiang; Pengcheng He; Weizhu Chen; Xiaodong Liu; Jianfeng Gao,; Tuo Zhao

arXiv:1911.03437·cs.CL·September 10, 2021

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao,, Tuo Zhao

PDF

5 Repos

TL;DR

This paper introduces a new fine-tuning framework for pre-trained NLP models that enhances robustness and efficiency by combining regularization and trust-region optimization, preventing overfitting and knowledge loss.

Contribution

It proposes a novel computational framework integrating regularization and Bregman proximal point optimization for improved fine-tuning of pre-trained language models.

Findings

01

Achieves state-of-the-art results on multiple NLP benchmarks.

02

Effectively prevents overfitting during fine-tuning.

03

Reduces knowledge forgetting in pre-trained models.

Abstract

Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.