Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
Zhen-Ru Zhang, Chuanqi Tan, Haiyang Xu, Chengyu Wang, Jun Huang,, Songfang Huang

TL;DR
This paper introduces Adaptive Prefix Tuning (APT), a method that dynamically adjusts prefix vectors at both token and layer levels to improve parameter-efficient fine-tuning of large language models, demonstrating enhanced performance on downstream tasks.
Contribution
The paper proposes a novel adaptive prefix tuning method that tailors prefix vectors to each layer, improving fine-tuning efficiency and effectiveness over fixed prefix approaches.
Findings
APT outperforms fixed prefix tuning on SuperGLUE and NER datasets.
Adaptive prefix demonstrates better layer-specific representation.
Variable prefix improves tuning efficiency and task performance.
Abstract
Fine-tuning large pre-trained language models on various downstream tasks with whole parameters is prohibitively expensive. Hence, Parameter-efficient fine-tuning has attracted attention that only optimizes a few task-specific parameters with the frozen pre-trained model. In this work, we focus on prefix tuning, which only optimizes continuous prefix vectors (i.e. pseudo tokens) inserted into Transformer layers. Based on the observation that the learned syntax and semantics representation varies a lot at different layers, we argue that the adaptive prefix will be further tailored to each layer than the fixed one, enabling the fine-tuning more effective and efficient. Thus, we propose Adaptive Prefix Tuning (APT) to adjust the prefix in terms of both fine-grained token level and coarse-grained layer level with a gate mechanism. Experiments on the SuperGLUE and NER datasets show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Absolute Position Encodings · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Attention Is All You Need · Linear Layer · Label Smoothing · Adam
