Making Pre-trained Language Models Better Few-shot Learners

Tianyu Gao; Adam Fisch; Danqi Chen

arXiv:2012.15723·cs.CL·June 3, 2021·129 cites

Making Pre-trained Language Models Better Few-shot Learners

Tianyu Gao, Adam Fisch, Danqi Chen

PDF

Open Access 5 Repos 1 Datasets

TL;DR

This paper introduces LM-BFF, a set of simple techniques for improving few-shot learning in smaller language models through prompt-based fine-tuning and dynamic demonstration selection, significantly outperforming standard methods.

Contribution

The paper presents LM-BFF, a novel, task-agnostic approach combining prompt-based fine-tuning and dynamic demonstration selection for better few-shot learning in smaller models.

Findings

01

Up to 30% absolute improvement over standard fine-tuning

02

11% average improvement across multiple NLP tasks

03

Effective in low-resource, minimal-resource scenarios

Abstract

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

juny116/few_glue
dataset· 45 dl
45 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Cosine Annealing · Adam · Byte Pair Encoding · Multi-Head Attention · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Softmax · Dense Connections