Efficient Training of Language Models to Fill in the Middle

Mohammad Bavarian; Heewoo Jun; Nikolas Tezak; John Schulman; Christine; McLeavey; Jerry Tworek; Mark Chen

arXiv:2207.14255·cs.CL·July 29, 2022·44 cites

Efficient Training of Language Models to Fill in the Middle

Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine, McLeavey, Jerry Tworek, Mark Chen

PDF

Open Access 4 Repos 10 Models 3 Datasets 1 Video

TL;DR

This paper demonstrates that autoregressive language models can be effectively trained to perform fill-in-the-middle tasks using a simple data transformation, without sacrificing their original generative abilities, and provides best practices for training such models.

Contribution

The authors introduce a straightforward data augmentation method for training language models to fill in text spans in the middle, showing it does not harm traditional capabilities and offering practical training guidelines.

Findings

01

FIM training does not impair left-to-right generation performance.

02

Best practices for FIM model training are established through ablations.

03

Released a state-of-the-art FIM model and benchmarks for future research.

Abstract

We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm the original left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide range of scales. Given the usefulness, simplicity, and efficiency of training models to fill-in-the-middle (FIM), we suggest that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

OpenAI’s New AI: Video Game Addict No More! 🤖· youtube

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods