ELECTRA is a Zero-Shot Learner, Too

Shiwen Ni; Hung-Yu Kao

arXiv:2207.08141·cs.CL·July 21, 2022

ELECTRA is a Zero-Shot Learner, Too

Shiwen Ni, Hung-Yu Kao

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that ELECTRA, a discriminative pre-trained model, can achieve state-of-the-art zero-shot NLP performance using a novel RTD-based prompt learning method, outperforming traditional MLM-based models.

Contribution

The paper introduces a new RTD-based prompt learning approach for ELECTRA, showing it surpasses MLM models in zero-shot NLP tasks.

Findings

01

RTD-ELECTRA-large improves average performance by 8.4% and 13.7% over MLM-RoBERTa-large and MLM-BERT-large.

02

RTD-ELECTRA-large achieves 90.1% accuracy on SST-2 without training.

03

Pre-trained replaced token detection models outperform masked language models in zero-shot learning.

Abstract

Recently, for few-shot or even zero-shot learning, the new paradigm "pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm. After the success of prompt-based GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa) prompt learning methods became popular and widely used. However, another efficient pre-trained discriminative model, ELECTRA, has probably been neglected. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a novel our proposed replaced token detection (RTD)-based prompt learning method. Experimental results show that ELECTRA model based on RTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance. Numerically, compared to MLM-RoBERTa-large and MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7% improvement on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nishiwen1214/rtd-electra
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections · Weight Decay · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing