PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery; Sharan Narang; Jacob Devlin; Maarten Bosma,; Gaurav Mishra; Adam Roberts; Paul Barham; Hyung Won Chung; Charles Sutton,; Sebastian Gehrmann; Parker Schuh; Kensen Shi; Sasha Tsvyashchenko; Joshua; Maynez; Abhishek Rao; Parker Barnes; Yi Tay; Noam Shazeer; Vinodkumar; Prabhakaran; Emily Reif; Nan Du; Ben Hutchinson; Reiner Pope; James Bradbury,; Jacob Austin; Michael Isard; Guy Gur-Ari; Pengcheng Yin; Toju Duke; Anselm; Levskaya; Sanjay Ghemawat; Sunipa Dev; Henryk Michalewski; Xavier Garcia,; Vedant Misra; Kevin Robinson; Liam Fedus; Denny Zhou; Daphne Ippolito; David; Luan; Hyeontaek Lim; Barret Zoph; Alexander Spiridonov; Ryan Sepassi; David; Dohan; Shivani Agrawal; Mark Omernick; Andrew M. Dai; Thanumalayan; Sankaranarayana Pillai; Marie Pellat; Aitor Lewkowycz; Erica Moreira; Rewon; Child; Oleksandr Polozov; Katherine Lee; Zongwei Zhou; Xuezhi Wang; Brennan; Saeta; Mark Diaz; Orhan Firat; Michele Catasta; Jason Wei; Kathy; Meier-Hellstern; Douglas Eck; Jeff Dean; Slav Petrov; Noah Fiedel

arXiv:2204.02311·cs.CL·October 6, 2022·2.1k cites

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma,, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton,, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua, Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer

PDF

Open Access 5 Repos 10 Models 2 Datasets 2 Videos

TL;DR

The paper introduces PaLM, a 540-billion parameter language model trained with Pathways, demonstrating state-of-the-art few-shot learning, multilingual, and code generation capabilities, along with analysis of bias, toxicity, and ethical considerations.

Contribution

It presents the development and training of PaLM, a large-scale language model using Pathways, achieving new benchmarks and insights into scaling effects on performance and safety.

Findings

01

Achieved state-of-the-art few-shot learning results on numerous benchmarks.

02

Outperformed fine-tuned SOTA and human performance on BIG-bench tasks.

03

Showed performance improvements steeply increase with model scale.

Abstract

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?· youtube

8 Ways ChatGPT 4 [Is] Better Than ChatGPT· youtube

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Parallel Layers · Adafactor · SentencePiece · Rotary Position Embedding · SwiGLU · Multi-Query Attention · Pathways Language Model · Dropout · Layer Normalization