Entailment as Few-Shot Learner

Sinong Wang; Han Fang; Madian Khabsa; Hanzi Mao; Hao Ma

arXiv:2104.14690·cs.CL·May 3, 2021·106 cites

Entailment as Few-Shot Learner

Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, Hao Ma

PDF

Open Access 3 Repos 5 Models

TL;DR

This paper introduces EFL, a method that transforms NLP tasks into entailment problems and fine-tunes small language models with minimal data, significantly enhancing their few-shot learning capabilities.

Contribution

The paper presents EFL, a novel approach that reformulates NLP tasks as entailment problems and improves small LMs' few-shot learning performance with minimal data and easy extensions.

Findings

01

EFL improves few-shot learning performance by 12% over existing methods.

02

EFL achieves competitive results with much larger models like GPT-3.

03

The approach can be combined with contrastive data augmentation and extended to multilingual tasks.

Abstract

Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners. However, their success hinges largely on scaling model parameters to a degree that makes it challenging to train and serve. In this paper, we propose a new approach, named as EFL, that can turn small LMs into better few-shot learners. The key idea of this approach is to reformulate potential NLP task into an entailment one, and then fine-tune the model with as little as 8 examples. We further demonstrate our proposed method can be: (i) naturally combined with an unsupervised contrastive learning-based data augmentation method; (ii) easily extended to multilingual few-shot learning. A systematic evaluation on 18 standard NLP tasks demonstrates that this approach improves the various existing SOTA few-shot learning methods by 12\%, and yields competitive few-shot performance with 500 times…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Warmup With Cosine Annealing · Residual Connection · Attention Dropout · Layer Normalization · Adam · Weight Decay