Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization
Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo

TL;DR
This paper introduces Layerwise Noise Stability Regularization (LNSR), a novel fine-tuning method for pre-trained language models that injects noise into hidden representations to improve generalization and reduce overfitting, validated on complex NLP tasks.
Contribution
The paper proposes a new noise-based regularization framework, LNSR, with theoretical support, outperforming existing methods on both simple and complex NLP tasks including question answering.
Findings
LNSR improves in-domain performance of language models.
LNSR enhances out-of-domain generalization.
LNSR outperforms L2-SP, Mixout, and SMART in experiments.
Abstract
The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on downstream tasks. Despite its recent success and wide adoption, fine-tuning a pre-trained language model often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR). Specifically, we propose to inject the standard Gaussian noise or In-manifold noise and regularize hidden representations of the fine-tuned model. We first provide theoretical analyses to support the efficacy of our method. We then demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
