Pre-trained Token-replaced Detection Model as Few-shot Learner

Zicheng Li; Shoushan Li; Guodong Zhou

arXiv:2203.03235·cs.CL·March 22, 2023·6 cites

Pre-trained Token-replaced Detection Model as Few-shot Learner

Zicheng Li, Shoushan Li, Guodong Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel few-shot learning method using pre-trained token-replaced detection models like ELECTRA, reformulating tasks as token-replaced detection problems and demonstrating superior performance over masked language models on multiple datasets.

Contribution

The paper presents a new approach to few-shot learning by leveraging token-replaced detection models, offering an alternative to masked language models with improved results.

Findings

01

Outperforms masked language models in few-shot tasks

02

Effective reformulation of classification/regression as token-replaced detection

03

Demonstrated on 16 datasets with superior results

Abstract

Pre-trained masked language models have demonstrated remarkable ability as few-shot learners. In this paper, as an alternative, we propose a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA. In this approach, we reformulate a classification or a regression task as a token-replaced detection problem. Specifically, we first define a template and label description words for each task and put them into the input to form a natural language prompt. Then, we employ the pre-trained token-replaced detection model to predict which label description word is the most original (i.e., least replaced) among all label description words in the prompt. A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models in both one-sentence and two-sentence learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cjfarmer/trd_fsl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · WordPiece · Weight Decay · Layer Normalization · Attention Dropout