Sparse Structure Search for Parameter-Efficient Tuning

Shengding Hu; Zhen Zhang; Ning Ding; Yadao Wang; Yasheng Wang; Zhiyuan; Liu; Maosong Sun

arXiv:2206.07382·cs.CL·June 16, 2022·6 cites

Sparse Structure Search for Parameter-Efficient Tuning

Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan, Liu, Maosong Sun

PDF

Open Access

TL;DR

This paper introduces S$^3$PET, an automatic method for searching sparse, parameter-efficient tuning structures in large pre-trained models, achieving high performance with minimal trainable parameters.

Contribution

It proposes a differentiable search framework for sparse PET structures, surpassing manual designs and enabling effective tuning with extremely low parameter budgets.

Findings

01

S$^3$PET outperforms manual and random structures in experiments.

02

It preserves over 99% of fine-tuning performance with only 0.01% trainable parameters.

03

The searched structures are transferable and provide design insights for PET methods.

Abstract

Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of parameter-efficient tuning (PET) find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, PET methods exquisitely design parameter-efficient modules (PET modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing PET modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{P}arameter-\textbf{E}fficient \textbf{T}uning (S $^{3}$ PET). Based on a unified framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques