GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, and Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle, McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth and, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel, Weinbach

TL;DR
GPT-NeoX-20B is a large, open-source 20-billion-parameter autoregressive language model trained on the Pile, demonstrating strong few-shot reasoning capabilities and outperforming comparable models in various tasks.
Contribution
This work introduces GPT-NeoX-20B, the largest openly available dense autoregressive model, with detailed architecture, training methodology, and performance evaluation.
Findings
GPT-NeoX-20B excels as a few-shot reasoner.
It outperforms similar-sized GPT-3 and FairSeq models in five-shot evaluations.
The model and training code are openly available for research.
Abstract
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗EleutherAI/gpt-neox-20bmodel· 264k dl· ♡ 580264k dl♡ 580
- 🤗KoboldAI/GPT-NeoX-20B-Skeinmodel· 899 dl· ♡ 11899 dl♡ 11
- 🤗KoboldAI/GPT-NeoX-20B-Erebusmodel· 1.2k dl· ♡ 861.2k dl♡ 86
- 🤗CarperAI/FIM-NeoX-1.3Bmodel· 37 dl· ♡ 2637 dl♡ 26
- 🤗Upword/gpt-neox-20b-embeddingsmodel· 10 dl10 dl
- 🤗KoboldAI/GPT-NeoX-20B-Erebus-GGMLmodel· ♡ 27♡ 27
- 🤗michaelfeil/ct2fast-gpt-neox-20bmodel· 2 dl2 dl
- 🤗Dampish/StellarX-4B-V0model· 909 dl· ♡ 1909 dl♡ 1
- 🤗Dampish/StellarX-4B-V0.2model· 905 dl· ♡ 2905 dl♡ 2
- 🤗papahawk/gpt-neox-20bmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · GPT-NeoX · Linear Layer · GPT-Neo · Cosine Annealing · Weight Decay · Dropout · Adam · Byte Pair Encoding
