ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Pengwei Tang, Xiaolin Hu, Yong Liu

TL;DR
ADePT introduces an adaptive prompt tuning method that learns token-specific embedding offsets via a neural network, enhancing model adaptation across diverse NLP tasks without increasing inference time or parameters.
Contribution
The paper proposes ADePT, a novel adaptive prompt tuning approach that improves generalization and optimization of token embeddings by using a neural network for token-specific offsets.
Findings
ADePT outperforms existing parameter-efficient tuning methods on 23 NLP tasks.
ADePT surpasses full fine-tuning in certain scenarios.
ADePT maintains inference speed and parameter count comparable to vanilla PT.
Abstract
Prompt Tuning (PT) enables the adaptation of Pre-trained Large Language Models (PLMs) to downstream tasks by optimizing a small amount of soft virtual tokens, which are prepended to the input token embeddings. Recently, Decomposed Prompt Tuning (DePT) has demonstrated superior adaptation capabilities by decomposing the soft prompt into a shorter soft prompt and a pair of low-rank matrices. The product of the pair of low-rank matrices is added to the input token embeddings to offset them. Additionally, DePT achieves faster inference compared to PT due to the shorter soft prompt. However, in this paper, we find that the position-based token embedding offsets of DePT restrict its ability to generalize across diverse model inputs, and that the shared embedding offsets across many token embeddings result in sub-optimization. To tackle these issues, we introduce Adaptive Decomposed Prompt…
Peer Reviews
Decision·ICLR 2025 Poster
This paper introduces the Adaptive Decomposition Prompt Tuning (ADePT) method, which innovatively addresses the generalization limitations caused by fixed token embedding offsets in traditional DePT methods. ADePT achieves excellent adaptability without increasing inference time or requiring additional parameters. The analysis is thorough, and the method is both simple and effective, offering new directions for future research. Additionally, the experiments are conducted rigorously, and the writ
I believe this method may lack generality, especially when applied to large language models and long-text tasks. The main reason for questioning this is whether feedforward neural networks possess sufficient semantic understanding capabilities. Additionally, there is room for further optimization in the figure.
1. The paper provides a comprehensive overview of Parameter Efficient Finetuning (PeFT), effectively situating ADePT within the broader research landscape and highlighting its contributions. 2. ADePT is intuitive and well-motivated. The arguments and experiments in Section 3.2 convincingly demonstrate the limitations of DePT being a low-rank absolute positional embedding, paving the way for ADePT, which instead uses token-wise MLP for calculating embedding offsets. 3. The experiments conducted
1. The robustness of the experiments with the 3B model does not match the standards set by the 220M scale evaluations. Notably, the selection of fewer benchmark tasks without clear justification, as well as the omission of significant baselines such as Adapters and LoRA, weakens the overall experimental credibility for the 3B model. 2. The performance improvements of ADePT over PT for the T5-3B model is only 0.1 or 0.2 pts for each task in Table 5. This tiny margin on a selected set of tasks ma
The paper introduces the other work in the space really well and does a good job contextualizing itself among that work. The pilot experiments highlighting the weakness of DePT are a good motivation for their work. The paper compares to a lot of different baselines, including PEFT methods beyond just prompt tuning. The paper evaluates the method on a lot of different datasets, increasing the trust you can put into setting good results if you used it on your task. The paper uses multiple diff
The weakness of DePT is outlined in the paper as its "fixed token embedding offsets". This point would be much clearer if it was re-framed as the DePT offsets are "position-based" while the ADePT offsets are "content/token-based". Both are "fixed token embedding offsets" (ADePT output is fixed once the input token is know, it isn't contextual). This framing would make a lot of their examples about the issues much clearer. For example the section about the [t1, t2] being added causing a shift if
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Analog and Mixed-Signal Circuit Design
