Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models
Wang Qi, Yu-Ping Ruan, Yuan Zuo, Taihao Li

TL;DR
This paper introduces LN-tuning, a parameter-efficient method that fine-tunes layer normalization parameters in pre-trained language models, achieving state-of-the-art results when combined with other tuning methods.
Contribution
The paper proposes LN-tuning, a novel approach that tunes layer normalization parameters with minimal additional parameters, and demonstrates its effectiveness within a unified tuning framework.
Findings
LN-tuning uses only 0.03% parameters, outperforming baselines.
Combining LN-tuning with prefix-tuning and adapters achieves SOTA performance.
Tuning MHA and LayerNorm together improves performance, while tuning FFN and LayerNorm decreases it.
Abstract
Conventional fine-tuning encounters increasing difficulties given the size of current Pre-trained Language Models, which makes parameter-efficient tuning become the focal point of frontier research. Previous methods in this field add tunable adapters into MHA or/and FFN of Transformer blocks to enable PLMs achieve transferability. However, as an important part of Transformer architecture, the power of layer normalization for parameter-efficent tuning is ignored. In this paper, we first propose LN-tuning, by tuning the gain and bias term of Layer Normalization module with only 0.03\% parameters, which is of high time-efficency and significantly superior to baselines which are less than 0.1\% tunable parameters. Further, we study the unified framework of combining LN-tuning with previous ones and we find that: (1) the unified framework of combining prefix-tuning, the adapter-based method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Adam · Absolute Position Encodings
