LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer   Learning

Yi-Lin Sung; Jaemin Cho; Mohit Bansal

arXiv:2206.06522·cs.CL·November 1, 2022·79 cites

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Yi-Lin Sung, Jaemin Cho, Mohit Bansal

PDF

Open Access 2 Repos 1 Video

TL;DR

Ladder Side-Tuning (LST) is a novel parameter-efficient transfer learning method that significantly reduces training memory by using a small side network with shortcut connections, outperforming existing PETL techniques in accuracy and memory savings.

Contribution

LST introduces a separate ladder side network with shortcut connections, eliminating the need for backpropagation through the entire backbone, thus greatly reducing memory usage during fine-tuning.

Findings

01

LST saves 69% of memory costs compared to full fine-tuning.

02

LST achieves higher accuracy than Adapter and LoRA in low-memory settings.

03

LST outperforms other PETL methods on NLP and vision-language tasks.

Abstract

Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that can reduce training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · SentencePiece · Gated Linear Unit · Inverse Square Root Schedule · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia?