TL;DR
This paper introduces a new fine-tuning framework for pre-trained NLP models that enhances robustness and efficiency by combining regularization and trust-region optimization, preventing overfitting and knowledge loss.
Contribution
It proposes a novel computational framework integrating regularization and Bregman proximal point optimization for improved fine-tuning of pre-trained language models.
Findings
Achieves state-of-the-art results on multiple NLP benchmarks.
Effectively prevents overfitting during fine-tuning.
Reduces knowledge forgetting in pre-trained models.
Abstract
Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
