Noise Stability Regularization for Improving BERT Fine-tuning

Hang Hua; Xingjian Li; Dejing Dou; Cheng-Zhong Xu; Jiebo Luo

arXiv:2107.04835·cs.CL·July 13, 2021

Noise Stability Regularization for Improving BERT Fine-tuning

Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo

PDF

Open Access

TL;DR

This paper introduces Layer-wise Noise Stability Regularization (LNSR), a novel method to enhance the stability and generalizability of BERT fine-tuning, especially with limited training data, by leveraging noise stability properties of deep networks.

Contribution

The paper proposes a new regularization technique, LNSR, grounded in noise stability theory, to improve BERT fine-tuning stability and outperform existing methods.

Findings

01

LNSR improves fine-tuning stability and generalizability.

02

Models with LNSR show lower sensitivity to noise.

03

LNSR outperforms state-of-the-art regularization methods.

Abstract

Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of training samples available. The brittleness of this process is often reflected by the sensitivity to random seeds. In this paper, we propose to tackle this problem based on the noise stability property of deep nets, which is investigated in recent literature (Arora et al., 2018; Sanyal et al., 2020). Specifically, we introduce a novel and effective regularization method to improve fine-tuning on NLP tasks, referred to as Layer-wise Noise Stability Regularization (LNSR). We extend the theories about adding noise to the input and prove that our method gives a stabler regularization effect. We provide supportive evidence by experimentally confirming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Residual Connection · Dense Connections · Softmax · WordPiece · Layer Normalization