Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao, Adam Fisch, Danqi Chen

TL;DR
This paper introduces LM-BFF, a set of simple techniques for improving few-shot learning in smaller language models through prompt-based fine-tuning and dynamic demonstration selection, significantly outperforming standard methods.
Contribution
The paper presents LM-BFF, a novel, task-agnostic approach combining prompt-based fine-tuning and dynamic demonstration selection for better few-shot learning in smaller models.
Findings
Up to 30% absolute improvement over standard fine-tuning
11% average improvement across multiple NLP tasks
Effective in low-resource, minimal-resource scenarios
Abstract
The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Cosine Annealing · Adam · Byte Pair Encoding · Multi-Head Attention · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Softmax · Dense Connections
