GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black; Stella Biderman; Eric Hallahan; Quentin Anthony; and Leo Gao; Laurence Golding; Horace He; Connor Leahy; Kyle; McDonell; Jason Phang; Michael Pieler; USVSN Sai Prashanth and; Shivanshu Purohit; Laria Reynolds; Jonathan Tow; Ben Wang; Samuel; Weinbach

arXiv:2204.06745·cs.CL·April 15, 2022·69 cites

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, and Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle, McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth and, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel, Weinbach

PDF

Open Access 5 Repos 10 Models

TL;DR

GPT-NeoX-20B is a large, open-source 20-billion-parameter autoregressive language model trained on the Pile, demonstrating strong few-shot reasoning capabilities and outperforming comparable models in various tasks.

Contribution

This work introduces GPT-NeoX-20B, the largest openly available dense autoregressive model, with detailed architecture, training methodology, and performance evaluation.

Findings

01

GPT-NeoX-20B excels as a few-shot reasoner.

02

It outperforms similar-sized GPT-3 and FairSeq models in five-shot evaluations.

03

The model and training code are openly available for research.

Abstract

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsMulti-Head Attention · Attention Is All You Need · GPT-NeoX · Linear Layer · GPT-Neo · Cosine Annealing · Weight Decay · Dropout · Adam · Byte Pair Encoding