Headless Language Models: Learning without Predicting with Contrastive   Weight Tying

Nathan Godey; \'Eric de la Clergerie; Beno\^it Sagot

arXiv:2309.08351·cs.CL·September 18, 2023

Headless Language Models: Learning without Predicting with Contrastive Weight Tying

Nathan Godey, \'Eric de la Clergerie, Beno\^it Sagot

PDF

Open Access 3 Models

TL;DR

This paper introduces Contrastive Weight Tying, a novel pretraining method for headless language models that reconstructs embeddings contrastively, reducing computational costs and improving downstream task performance.

Contribution

It presents a new contrastive pretraining approach for headless language models that significantly lowers training costs and boosts accuracy across multiple benchmarks.

Findings

01

Reduced training computational requirements by up to 20 times.

02

Achieved +1.6 GLUE score improvement.

03

Attained +2.7 LAMBADA accuracy increase.

Abstract

Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies. In this study, we propose an innovative method that shifts away from probability prediction and instead focuses on reconstructing input embeddings in a contrastive fashion via Constrastive Weight Tying (CWT). We apply this approach to pretrain Headless Language Models in both monolingual and multilingual contexts. Our method offers practical advantages, substantially reducing training computational requirements by up to 20 times, while simultaneously enhancing downstream performance and data efficiency. We observe a significant +1.6 GLUE score increase and a notable +2.7 LAMBADA accuracy improvement compared to classical LMs within similar compute budgets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsWeight Tying