Ahead-of-Time P-Tuning

Daniil Gavrilov; Nikita Balagansky

arXiv:2305.10835·cs.LG·May 19, 2023·1 cites

Ahead-of-Time P-Tuning

Daniil Gavrilov, Nikita Balagansky

PDF

Open Access

TL;DR

Ahead-of-Time P-Tuning is a new parameter-efficient fine-tuning approach for pre-trained language models that adds input-dependent biases before each Transformer layer, improving performance on benchmarks with minimal overhead.

Contribution

It introduces AoT P-Tuning, a novel method that enhances fine-tuning efficiency and multi-task capability for large language models.

Findings

01

Outperforms BitFit on GLUE and SuperGLUE benchmarks

02

Comparable or better than baseline fine-tuning methods

03

Negligible inference overhead

Abstract

In this paper, we propose Ahead-of-Time (AoT) P-Tuning, a novel parameter-efficient fine-tuning method for pre-trained Language Models (LMs) that adds input-dependent bias before each Transformer layer. We evaluate AoT P-Tuning on GLUE and SuperGLUE benchmarking datasets using RoBERTa and DeBERTa models, showing that it outperforms BitFit and is comparable or better than other baseline methods for efficient fine-tuning. Additionally, we assess the inference overhead of AoT P-Tuning and demonstrate that it introduces negligible overhead compared to established baseline methods. Our method enables multi-task inference with a single backbone LM, making it a practical solution for real-world applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsHow do I file a dispute with Expedia?*DisputeFastService · Attention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · WordPiece · Linear Layer · Weight Decay · Attention Dropout · Position-Wise Feed-Forward Layer · Dense Connections