Training ELECTRA Augmented with Multi-word Selection
Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han

TL;DR
This paper introduces an improved ELECTRA pre-training method that combines token replacement detection with multi-word selection, enhancing semantic understanding and efficiency in NLP tasks.
Contribution
The study proposes a multi-task learning approach for ELECTRA, incorporating token selection tasks and novel techniques for task integration, advancing pre-training effectiveness.
Findings
Outperforms baseline ELECTRA on GLUE and SQuAD datasets.
Achieves higher accuracy with reduced training resources.
Demonstrates improved semantic understanding in NLP tasks.
Abstract
Pre-trained text encoders such as BERT and its variants have recently achieved state-of-the-art performances on many NLP tasks. While being effective, these pre-training methods typically demand massive computation resources. To accelerate pre-training, ELECTRA trains a discriminator that predicts whether each input token is replaced by a generator. However, this new task, as a binary classification, is less semantically informative. In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets. We further develop two techniques to effectively combine all pre-training tasks: (1) using attention-based networks for task-specific heads, and (2) sharing bottom layers of the generator and the discriminator.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Attention Dropout · Dense Connections
