On the Effectiveness of Parameter-Efficient Fine-Tuning
Zihao Fu, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, Nigel, Collier

TL;DR
This paper analyzes parameter-efficient fine-tuning methods, providing a theoretical understanding of their stability and generalization, and introduces a novel Second-order Approximation Method (SAM) to optimize tunable parameters effectively.
Contribution
It categorizes existing sparse fine-tuning methods, offers a theoretical analysis of their regularization effect, and proposes SAM for better parameter selection.
Findings
Sparsity acts as regularization, improving stability and generalization.
Theoretical analysis explains why sparse fine-tuning outperforms full fine-tuning.
SAM outperforms strong baselines in experiments.
Abstract
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Multimodal Machine Learning Applications
