TL;DR
This paper empirically evaluates prompt tuning in code intelligence tasks, demonstrating it often surpasses fine-tuning, especially in low-resource scenarios, by effectively leveraging task-specific prompts with pre-trained models like CodeBERT and CodeT5.
Contribution
It provides the first comprehensive empirical comparison of prompt tuning versus fine-tuning in code intelligence tasks, highlighting prompt tuning's advantages in scarce data situations.
Findings
Prompt tuning outperforms fine-tuning across all tested tasks.
Significant improvements in low-resource scenarios, e.g., 26% BLEU score increase.
Prompt tuning is especially effective when task-specific data is limited.
Abstract
Pre-trained models have been shown effective in many code intelligence tasks. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the inputs to pre-training and downstream tasks are in different forms, it is hard to fully explore the knowledge of pre-trained models. Besides, the performance of fine-tuning strongly relies on the amount of downstream data, while in practice, the scenarios with scarce data are common. Recent studies in the natural language processing (NLP) field show that prompt tuning, a new paradigm for tuning, alleviates the above issues and achieves promising results in various NLP tasks. In prompt tuning, the prompts inserted during tuning provide task-specific knowledge, which is especially beneficial for tasks with relatively scarce data. In this paper, we empirically evaluate the usage and effect of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Layer Normalization · Dropout · Residual Connection · SentencePiece · Inverse Square Root Schedule
