Ahead-of-Time P-Tuning
Daniil Gavrilov, Nikita Balagansky

TL;DR
Ahead-of-Time P-Tuning is a new parameter-efficient fine-tuning approach for pre-trained language models that adds input-dependent biases before each Transformer layer, improving performance on benchmarks with minimal overhead.
Contribution
It introduces AoT P-Tuning, a novel method that enhances fine-tuning efficiency and multi-task capability for large language models.
Findings
Outperforms BitFit on GLUE and SuperGLUE benchmarks
Comparable or better than baseline fine-tuning methods
Negligible inference overhead
Abstract
In this paper, we propose Ahead-of-Time (AoT) P-Tuning, a novel parameter-efficient fine-tuning method for pre-trained Language Models (LMs) that adds input-dependent bias before each Transformer layer. We evaluate AoT P-Tuning on GLUE and SuperGLUE benchmarking datasets using RoBERTa and DeBERTa models, showing that it outperforms BitFit and is comparable or better than other baseline methods for efficient fine-tuning. Additionally, we assess the inference overhead of AoT P-Tuning and demonstrate that it introduces negligible overhead compared to established baseline methods. Our method enables multi-task inference with a single backbone LM, making it a practical solution for real-world applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsHow do I file a dispute with Expedia?*DisputeFastService · Attention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · WordPiece · Linear Layer · Weight Decay · Attention Dropout · Position-Wise Feed-Forward Layer · Dense Connections
