Improving Pre-trained Language Model Fine-tuning with Noise Stability   Regularization

Hang Hua; Xingjian Li; Dejing Dou; Cheng-Zhong Xu; Jiebo Luo

arXiv:2206.05658·cs.CL·November 13, 2023

Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo

PDF

TL;DR

This paper introduces Layerwise Noise Stability Regularization (LNSR), a novel fine-tuning method for pre-trained language models that injects noise into hidden representations to improve generalization and reduce overfitting, validated on complex NLP tasks.

Contribution

The paper proposes a new noise-based regularization framework, LNSR, with theoretical support, outperforming existing methods on both simple and complex NLP tasks including question answering.

Findings

01

LNSR improves in-domain performance of language models.

02

LNSR enhances out-of-domain generalization.

03

LNSR outperforms L2-SP, Mixout, and SMART in experiments.

Abstract

The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on downstream tasks. Despite its recent success and wide adoption, fine-tuning a pre-trained language model often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR). Specifically, we propose to inject the standard Gaussian noise or In-manifold noise and regularize hidden representations of the fine-tuned model. We first provide theoretical analyses to support the efficacy of our method. We then demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.