Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Li Zhang, Wei-Lun, Chao

TL;DR
This paper systematically compares various parameter-efficient fine-tuning methods for visual recognition, revealing their comparable accuracy, different error patterns, and potential for ensemble use, while also exploring their robustness and efficiency.
Contribution
It provides a comprehensive empirical study of PEFT methods on Vision Transformers, offering practical insights, a user guide, and new findings on their performance and robustness.
Findings
PEFT methods achieve similar accuracy in low-shot tasks when carefully tuned.
Different PEFT methods make different mistakes and high-confidence predictions.
PEFT is effective beyond low-shot regimes, matching or surpassing full fine-tuning with fewer parameters.
Abstract
Parameter-efficient fine-tuning (PEFT) has attracted significant attention due to the growth of pre-trained model sizes and the need to fine-tune (FT) them for superior downstream performance. Despite a surge in new PEFT methods, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like "when to apply PEFT" and "which method to use" largely unanswered, especially in visual recognition. In this paper, we conduct a unifying empirical study of representative PEFT methods with Vision Transformers. We systematically tune their hyperparameters to fairly compare their accuracy on downstream tasks. Our study offers a practical user guide and unveils several new insights. First, if tuned carefully, different PEFT methods achieve similar accuracy in the low-shot benchmark VTAB-1K. This includes simple approaches like FT the bias terms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face and Expression Recognition · Neural Networks and Applications
MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training
