OPT: Open Pre-trained Transformer Language Models
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen,, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor, Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh, Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

TL;DR
The paper introduces OPT, a suite of open-source pre-trained transformer models from 125M to 175B parameters, designed to democratize access and facilitate research while reducing environmental impact.
Contribution
It provides a large, open collection of transformer models with detailed documentation and tools, enabling broader research and responsible sharing of powerful language models.
Findings
OPT-175B matches GPT-3 performance
Requires only 1/7th of GPT-3's carbon footprint
Full suite and tools are openly available
Abstract
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/opt-2.7bmodel· 21k dl· ♡ 8721k dl♡ 87
- 🤗facebook/opt-125mmodel· 7.0M dl· ♡ 2367.0M dl♡ 236
- 🤗facebook/opt-350mmodel· 170k dl· ♡ 149170k dl♡ 149
- 🤗facebook/opt-1.3bmodel· 332k dl· ♡ 182332k dl♡ 182
- 🤗facebook/opt-6.7bmodel· 29k dl· ♡ 11829k dl♡ 118
- 🤗facebook/opt-13bmodel· 16k dl· ♡ 6516k dl♡ 65
- 🤗facebook/opt-30bmodel· 12k dl· ♡ 13612k dl♡ 136
- 🤗facebook/opt-66bmodel· 8.2k dl· ♡ 1748.2k dl♡ 174
- 🤗KoboldAI/OPT-6B-nerys-v2model· 852 dl· ♡ 24852 dl♡ 24
- 🤗model-attribution-challenge/opt-350mmodel· 14 dl14 dl
Videos
[ML News] Meta's OPT 175B language model | DALL-E Mega is training | TorToiSe TTS fakes my voice· youtube
GPT-NeoX-20B | BigScience BLOOM | OPT-175B | Training Large Language Models | Papers Explained· youtube
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · OPT · Linear Layer · Multi-Head Attention · Layer Normalization · Softmax · Dropout · Adam · Byte Pair Encoding
