Continued Pretraining for Better Zero- and Few-Shot Promptability
Zhaofeng Wu, Robert L. Logan IV, Pete Walsh, Akshita Bhagia, Dirk, Groeneveld, Sameer Singh, Iz Beltagy

TL;DR
This paper explores how continued pretraining can enhance language models' ability to perform zero- and few-shot tasks with prompts, showing that a simple prompt-including pretraining approach improves promptability significantly.
Contribution
It introduces a straightforward continued pretraining method with trainable prompts that outperforms existing techniques in zero- and few-shot settings.
Findings
Continued pretraining with trainable prompts improves promptability by up to 31%.
MAML-style meta-learning underperforms for promptability enhancement.
Recommendations are provided for optimizing promptability across use cases.
Abstract
Recently introduced language model prompting methods can achieve high accuracy in zero- and few-shot settings while requiring few to no learned task-specific parameters. Nevertheless, these methods still often trail behind full model finetuning. In this work, we investigate if a dedicated continued pretraining stage could improve "promptability", i.e., zero-shot performance with natural language prompts or few-shot performance with prompt tuning. We reveal settings where existing continued pretraining methods lack promptability. We also identify current methodological gaps, which we fill with thorough large-scale experiments. We demonstrate that a simple recipe, continued pretraining that incorporates a trainable prompt during multi-task learning, leads to improved promptability in both zero- and few-shot settings compared to existing methods, up to 31% relative. On the other hand, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
