OPT: Open Pre-trained Transformer Language Models

Susan Zhang; Stephen Roller; Naman Goyal; Mikel Artetxe; Moya Chen,; Shuohui Chen; Christopher Dewan; Mona Diab; Xian Li; Xi Victoria Lin; Todor; Mihaylov; Myle Ott; Sam Shleifer; Kurt Shuster; Daniel Simig; Punit Singh; Koura; Anjali Sridhar; Tianlu Wang; Luke Zettlemoyer

arXiv:2205.01068·cs.CL·June 22, 2022

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen,, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor, Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh, Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

PDF

5 Repos 10 Models 2 Videos

TL;DR

The paper introduces OPT, a suite of open-source pre-trained transformer models from 125M to 175B parameters, designed to democratize access and facilitate research while reducing environmental impact.

Contribution

It provides a large, open collection of transformer models with detailed documentation and tools, enabling broader research and responsible sharing of powerful language models.

Findings

01

OPT-175B matches GPT-3 performance

02

Requires only 1/7th of GPT-3's carbon footprint

03

Full suite and tools are openly available

Abstract

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

[ML News] Meta's OPT 175B language model | DALL-E Mega is training | TorToiSe TTS fakes my voice· youtube

GPT-NeoX-20B | BigScience BLOOM | OPT-175B | Training Large Language Models | Papers Explained· youtube

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · OPT · Linear Layer · Multi-Head Attention · Layer Normalization · Softmax · Dropout · Adam · Byte Pair Encoding