LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung, Jaemin Cho, Mohit Bansal

TL;DR
Ladder Side-Tuning (LST) is a novel parameter-efficient transfer learning method that significantly reduces training memory by using a small side network with shortcut connections, outperforming existing PETL techniques in accuracy and memory savings.
Contribution
LST introduces a separate ladder side network with shortcut connections, eliminating the need for backpropagation through the entire backbone, thus greatly reducing memory usage during fine-tuning.
Findings
LST saves 69% of memory costs compared to full fine-tuning.
LST achieves higher accuracy than Adapter and LoRA in low-memory settings.
LST outperforms other PETL methods on NLP and vision-language tasks.
Abstract
Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that can reduce training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · SentencePiece · Gated Linear Unit · Inverse Square Root Schedule · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia?
