Sparse Structure Search for Parameter-Efficient Tuning
Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan, Liu, Maosong Sun

TL;DR
This paper introduces S$^3$PET, an automatic method for searching sparse, parameter-efficient tuning structures in large pre-trained models, achieving high performance with minimal trainable parameters.
Contribution
It proposes a differentiable search framework for sparse PET structures, surpassing manual designs and enabling effective tuning with extremely low parameter budgets.
Findings
S$^3$PET outperforms manual and random structures in experiments.
It preserves over 99% of fine-tuning performance with only 0.01% trainable parameters.
The searched structures are transferable and provide design insights for PET methods.
Abstract
Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of parameter-efficient tuning (PET) find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, PET methods exquisitely design parameter-efficient modules (PET modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing PET modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{P}arameter-\textbf{E}fficient \textbf{T}uning (SPET). Based on a unified framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
