COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett,, Jiawei Han, Xia Song

TL;DR
COCO-LM introduces a self-supervised pretraining framework for language models that combines correcting corrupted tokens and contrasting sequence representations, leading to improved accuracy and efficiency.
Contribution
It proposes a novel pretraining method using correction and contrastive tasks, outperforming existing models in accuracy and pretraining efficiency.
Findings
Outperforms state-of-the-art models on GLUE and SQuAD.
Achieves ELECTRA-level accuracy with half the pretraining GPU hours.
Improves GLUE average scores by over 1 point.
Abstract
We present a self-supervised learning framework, COCO-LM, that pretrains Language Models by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style pretraining, COCO-LM employs an auxiliary language model to corrupt text sequences, upon which it constructs two new tasks for pretraining the main model. The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics. The second sequence-level task, Sequence Contrastive Learning, is to align text sequences originated from the same source input while ensuring uniformity in the representation space. Experiments on GLUE and SQuAD demonstrate that COCO-LM not only outperforms recent state-of-the-art pretrained models in accuracy, but also improves pretraining efficiency. It achieves the MNLI accuracy of ELECTRA with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Contrastive Learning · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Multi-Head Attention
