LayerNorm: A key component in parameter-efficient fine-tuning

Taha ValizadehAslani; Hualou Liang

arXiv:2403.20284·cs.CL·April 1, 2024·1 cites

LayerNorm: A key component in parameter-efficient fine-tuning

Taha ValizadehAslani, Hualou Liang

PDF

Open Access

TL;DR

This paper identifies LayerNorm as the most critical component in BERT for fine-tuning, demonstrating that fine-tuning only LayerNorm achieves comparable performance to full model tuning, thus enabling more efficient NLP task adaptation.

Contribution

The study reveals that fine-tuning only LayerNorm layers in BERT is sufficient for competitive performance, offering a simple yet effective parameter-efficient fine-tuning method.

Findings

01

LayerNorm changes most during fine-tuning across tasks.

02

Fine-tuning only LayerNorm matches full fine-tuning performance.

03

Small subset of LayerNorm can be fine-tuned with negligible loss.

Abstract

Fine-tuning a pre-trained model, such as Bidirectional Encoder Representations from Transformers (BERT), has been proven to be an effective method for solving many natural language processing (NLP) tasks. However, due to the large number of parameters in many state-of-the-art NLP models, including BERT, the process of fine-tuning is computationally expensive. One attractive solution to this issue is parameter-efficient fine-tuning, which involves modifying only a minimal segment of the model while keeping the remainder unchanged. Yet, it remains unclear which segment of the BERT model is crucial for fine-tuning. In this paper, we first analyze different components in the BERT model to pinpoint which one undergoes the most significant changes after fine-tuning. We find that output LayerNorm changes more than any other components when fine-tuned for different General Language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemiconductor Lasers and Optical Devices

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · WordPiece · Multi-Head Attention · Weight Decay · Softmax · Dense Connections