Parameter-Efficient Tuning on Layer Normalization for Pre-trained   Language Models

Wang Qi; Yu-Ping Ruan; Yuan Zuo; Taihao Li

arXiv:2211.08682·cs.CL·December 12, 2022·6 cites

Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models

Wang Qi, Yu-Ping Ruan, Yuan Zuo, Taihao Li

PDF

Open Access

TL;DR

This paper introduces LN-tuning, a parameter-efficient method that fine-tunes layer normalization parameters in pre-trained language models, achieving state-of-the-art results when combined with other tuning methods.

Contribution

The paper proposes LN-tuning, a novel approach that tunes layer normalization parameters with minimal additional parameters, and demonstrates its effectiveness within a unified tuning framework.

Findings

01

LN-tuning uses only 0.03% parameters, outperforming baselines.

02

Combining LN-tuning with prefix-tuning and adapters achieves SOTA performance.

03

Tuning MHA and LayerNorm together improves performance, while tuning FFN and LayerNorm decreases it.

Abstract

Conventional fine-tuning encounters increasing difficulties given the size of current Pre-trained Language Models, which makes parameter-efficient tuning become the focal point of frontier research. Previous methods in this field add tunable adapters into MHA or/and FFN of Transformer blocks to enable PLMs achieve transferability. However, as an important part of Transformer architecture, the power of layer normalization for parameter-efficent tuning is ignored. In this paper, we first propose LN-tuning, by tuning the gain and bias term of Layer Normalization module with only 0.03\% parameters, which is of high time-efficency and significantly superior to baselines which are less than 0.1\% tunable parameters. Further, we study the unified framework of combining LN-tuning with previous ones and we find that: (1) the unified framework of combining prefix-tuning, the adapter-based method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Adam · Absolute Position Encodings