Pre-trained Token-replaced Detection Model as Few-shot Learner
Zicheng Li, Shoushan Li, Guodong Zhou

TL;DR
This paper introduces a novel few-shot learning method using pre-trained token-replaced detection models like ELECTRA, reformulating tasks as token-replaced detection problems and demonstrating superior performance over masked language models on multiple datasets.
Contribution
The paper presents a new approach to few-shot learning by leveraging token-replaced detection models, offering an alternative to masked language models with improved results.
Findings
Outperforms masked language models in few-shot tasks
Effective reformulation of classification/regression as token-replaced detection
Demonstrated on 16 datasets with superior results
Abstract
Pre-trained masked language models have demonstrated remarkable ability as few-shot learners. In this paper, as an alternative, we propose a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA. In this approach, we reformulate a classification or a regression task as a token-replaced detection problem. Specifically, we first define a template and label description words for each task and put them into the input to form a natural language prompt. Then, we employ the pre-trained token-replaced detection model to predict which label description word is the most original (i.e., least replaced) among all label description words in the prompt. A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models in both one-sentence and two-sentence learning tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · WordPiece · Weight Decay · Layer Normalization · Attention Dropout
