Efficient Training of Language Models to Fill in the Middle
Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine, McLeavey, Jerry Tworek, Mark Chen

TL;DR
This paper demonstrates that autoregressive language models can be effectively trained to perform fill-in-the-middle tasks using a simple data transformation, without sacrificing their original generative abilities, and provides best practices for training such models.
Contribution
The authors introduce a straightforward data augmentation method for training language models to fill in text spans in the middle, showing it does not harm traditional capabilities and offering practical training guidelines.
Findings
FIM training does not impair left-to-right generation performance.
Best practices for FIM model training are established through ablations.
Released a state-of-the-art FIM model and benchmarks for future research.
Abstract
We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm the original left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide range of scales. Given the usefulness, simplicity, and efficiency of training models to fill-in-the-middle (FIM), we suggest that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗bigcode/starcodermodel· 10k dl· ♡ 293210k dl♡ 2932
- 🤗bigcode/starcoder2-15bmodel· 5.2k dl· ♡ 6655.2k dl♡ 665
- 🤗CarperAI/FIM-NeoX-1.3Bmodel· 37 dl· ♡ 2637 dl♡ 26
- 🤗bigcode/santacodermodel· 7.0k dl· ♡ 3357.0k dl♡ 335
- 🤗olivierdehaene/optimized-santacodermodel· 19 dl· ♡ 819 dl♡ 8
- 🤗mrm8488/santacoder-finetuned-the-stack-bash-shellmodel· 19 dl· ♡ 519 dl♡ 5
- 🤗muhtasham/santacoder-finetuned-the-stack-assemblymodel· 11 dl· ♡ 111 dl♡ 1
- 🤗mrm8488/santacoder-finetuned-the-stack-swiftmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗muhtasham/santacoder-finetuned-the-stack-cobolmodel· 18 dl· ♡ 518 dl♡ 5
- 🤗mrm8488/santacoder-finetuned-the-stack-clojuremodel· 16 dl· ♡ 116 dl♡ 1
Videos
OpenAI’s New AI: Video Game Addict No More! 🤖· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
