DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Zhengxiang Shi, Aldo Lipani

TL;DR
DePT introduces a decomposed prompt tuning method that reduces memory and time costs while improving performance in parameter-efficient fine-tuning for large language and vision-language models.
Contribution
DePT decomposes soft prompts into shorter prompts and low-rank matrices, enhancing efficiency without increasing trainable parameters.
Findings
Outperforms state-of-the-art PEFT methods on 23 NLP and VL tasks.
More efficient as model size increases.
Seamlessly integrates with few-shot learning and various architectures.
Abstract
Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are…
Peer Reviews
Decision·ICLR 2024 poster
S1: The idea is very simple and leads to decent improvements over the baseline methods. Also, the paper is very easy to read and understand. S2: The experiments are thorough enough, however, I have some mild additional suggestions that might make the experimental section more complete.
W1: Some of the important baseline methods like IA3 are missing. See questions below. W2: The idea is interesting, however some more intuition on why this works might strengthen the paper.
To the best of the reviewer’s knowledge, the method proposed in this paper DeFT is novel. The authors also provide solid intuitions and reasoning for this method. Besides the method constructions, the experiments are comprehensive. I also appreciate the authors’ efforts in organizing the anonymous project code that covers the experiments.
The key contribution of DePT lies in that it is both optimizing a soft context as long as the vocabulary in an efficient manner. The decomposition idea, although novel in its current form, is incremental to current PEFT methods. There is also existing work (e.g. [1]) that explored tuning subsets of vocabularies as a way of PEFT. That being said, DePT still has the advantage of efficient vocabulary tuning. The 20% efficiency advantage also is only revealed with one soft prompt length of 100. It w
**Paper quality.** The paper is well written. The organization of the paper is clear and well thought out. I enjoyed reading the paper. **Extensive experiments.** The paper extensively experiments with improved results compared to prompt tuning while being more efficient during training and inference. The authors have done a great job comparing the work with other recent methods in the literature and show that DEPT outperforms them on GLUE and SuperGLUE. They further provide evidence of their
**The architecture is not well motivated.** The architecture appears to be a combination of prompt tuning and LoRA. But, unlike LoRA, DEPT still suffers from prompt length compared to architectures at inference time. While DEPT can also achieve the same inference speed as the base model, like LoRA, when the prompt length is 0, in Figure 3, we see that the performance is about 20 points below the DEPT performance reported in Table 1. Furthermore, decomposing the prompts does not offer any concep
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
