COCO-LM: Correcting and Contrasting Text Sequences for Language Model   Pretraining

Yu Meng; Chenyan Xiong; Payal Bajaj; Saurabh Tiwary; Paul Bennett,; Jiawei Han; Xia Song

arXiv:2102.08473·cs.CL·October 28, 2021·129 cites

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett,, Jiawei Han, Xia Song

PDF

Open Access 2 Repos 2 Models 1 Video

TL;DR

COCO-LM introduces a self-supervised pretraining framework for language models that combines correcting corrupted tokens and contrasting sequence representations, leading to improved accuracy and efficiency.

Contribution

It proposes a novel pretraining method using correction and contrastive tasks, outperforming existing models in accuracy and pretraining efficiency.

Findings

01

Outperforms state-of-the-art models on GLUE and SQuAD.

02

Achieves ELECTRA-level accuracy with half the pretraining GPU hours.

03

Improves GLUE average scores by over 1 point.

Abstract

We present a self-supervised learning framework, COCO-LM, that pretrains Language Models by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style pretraining, COCO-LM employs an auxiliary language model to corrupt text sequences, upon which it constructs two new tasks for pretraining the main model. The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics. The second sequence-level task, Sequence Contrastive Learning, is to align text sequences originated from the same source input while ensuring uniformity in the representation space. Experiments on GLUE and SQuAD demonstrate that COCO-LM not only outperforms recent state-of-the-art pretrained models in accuracy, but also improves pretraining efficiency. It achieves the MNLI accuracy of ELECTRA with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Contrastive Learning · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Multi-Head Attention