Pre-Training Transformers as Energy-Based Cloze Models
Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

TL;DR
Electric is an energy-based model for text representation that assigns likelihood scores to tokens without masking, offering efficient re-ranking and insights into ELECTRA's pre-training.
Contribution
It introduces Electric, a novel energy-based cloze model that improves likelihood scoring and provides a clearer understanding of ELECTRA's pre-training process.
Findings
Electric outperforms traditional models in re-ranking speech recognition outputs.
It is faster than masked language models for likelihood scoring.
Electric effectively transfers to downstream NLP tasks.
Abstract
We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
MethodsLinear Layer · Electric · WordPiece · Linear Warmup With Linear Decay · Attention Is All You Need · Layer Normalization · Dropout · Weight Decay · Dense Connections · ELECTRA
