UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao, Bo Wan, Ying Zhang, Xu Jia, Huchuan Lu, Long Chen

TL;DR
UniPT introduces a universal, memory-efficient transfer learning method that uses a lightweight parallel network to improve adaptability, scalability, and performance across diverse models and tasks.
Contribution
The paper proposes UniPT, a novel parallel tuning strategy that decouples transfer learning from backbone dependencies, reducing memory use and enhancing generalizability.
Findings
Reduces memory consumption significantly.
Outperforms existing PETL methods on multiple datasets.
Achieves competitive or superior task performance.
Abstract
Parameter-efficient transfer learning (PETL), i.e., fine-tuning a small portion of parameters, is an effective strategy for adapting pre-trained models to downstream domains. To further reduce the memory demand, recent PETL works focus on the more valuable memory-efficient characteristic. In this paper, we argue that the scalability, adaptability, and generalizability of state-of-the-art methods are hindered by structural dependency and pertinency on specific pre-trained backbones. To this end, we propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT), to mitigate these weaknesses. Specifically, we facilitate the transfer process via a lightweight and learnable parallel network, which consists of: 1) A parallel interaction module that decouples the sequential connections and processes the intermediate activations detachedly from the pre-trained network. 2) A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications
MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Byte Pair Encoding · Layer Normalization · Attention Dropout · Softmax · Dense Connections · Focus
