Training ELECTRA Augmented with Multi-word Selection

Jiaming Shen; Jialu Liu; Tianqi Liu; Cong Yu; Jiawei Han

arXiv:2106.00139·cs.CL·March 4, 2022·1 cites

Training ELECTRA Augmented with Multi-word Selection

Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han

PDF

Open Access

TL;DR

This paper introduces an improved ELECTRA pre-training method that combines token replacement detection with multi-word selection, enhancing semantic understanding and efficiency in NLP tasks.

Contribution

The study proposes a multi-task learning approach for ELECTRA, incorporating token selection tasks and novel techniques for task integration, advancing pre-training effectiveness.

Findings

01

Outperforms baseline ELECTRA on GLUE and SQuAD datasets.

02

Achieves higher accuracy with reduced training resources.

03

Demonstrates improved semantic understanding in NLP tasks.

Abstract

Pre-trained text encoders such as BERT and its variants have recently achieved state-of-the-art performances on many NLP tasks. While being effective, these pre-training methods typically demand massive computation resources. To accelerate pre-training, ELECTRA trains a discriminator that predicts whether each input token is replaced by a generator. However, this new task, as a binary classification, is less semantically informative. In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets. We further develop two techniques to effectively combine all pre-training tasks: (1) using attention-based networks for task-specific heads, and (2) sharing bottom layers of the generator and the discriminator.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Attention Dropout · Dense Connections