Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained   Models

Mengzhou Xia; Mikel Artetxe; Jingfei Du; Danqi Chen; Ves Stoyanov

arXiv:2205.15223·cs.CL·October 28, 2022·1 cites

Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models

Mengzhou Xia, Mikel Artetxe, Jingfei Du, Danqi Chen, Ves Stoyanov

PDF

Open Access 1 Repo

TL;DR

This paper adapts prompt-based few-shot learning to ELECTRA, a discriminative pre-trained model, demonstrating it outperforms masked language models across various tasks without additional parameters.

Contribution

It introduces a novel prompt-based few-shot learning method for ELECTRA, leveraging its discriminative training to improve performance without extra computational costs.

Findings

01

ELECTRA outperforms masked language models in few-shot tasks.

02

The method requires no additional parameters or computation.

03

ELECTRA learns distributions better aligned with downstream tasks.

Abstract

Pre-trained masked language models successfully perform few-shot learning by formulating downstream tasks as text infilling. However, as a strong alternative in full-shot settings, discriminative pre-trained models like ELECTRA do not fit into the paradigm. In this work, we adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks. ELECTRA is pre-trained to distinguish if a token is generated or original. We naturally extend that to prompt-based few-shot learning by training to score the originality of the target options without introducing new parameters. Our method can be easily adapted to tasks involving multi-token predictions without extra computation overhead. Analysis shows that ELECTRA learns distributions that align better with downstream tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/electra-fewshot-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Weight Decay