PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma,, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton,, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua, Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer

TL;DR
The paper introduces PaLM, a 540-billion parameter language model trained with Pathways, demonstrating state-of-the-art few-shot learning, multilingual, and code generation capabilities, along with analysis of bias, toxicity, and ethical considerations.
Contribution
It presents the development and training of PaLM, a large-scale language model using Pathways, achieving new benchmarks and insights into scaling effects on performance and safety.
Findings
Achieved state-of-the-art few-shot learning results on numerous benchmarks.
Outperformed fine-tuned SOTA and human performance on BIG-bench tasks.
Showed performance improvements steeply increase with model scale.
Abstract
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗cyrilzhang/gpt2-numfixmodel· 24 dl24 dl
- 🤗HuggingFaceM4/idefics-80bmodel· 331 dl· ♡ 69331 dl♡ 69
- 🤗HuggingFaceM4/idefics-9bmodel· 1.9k dl· ♡ 471.9k dl♡ 47
- 🤗HuggingFaceM4/idefics-9b-instructmodel· 1.2k dl· ♡ 1071.2k dl♡ 107
- 🤗HuggingFaceM4/idefics-80b-instructmodel· 5.3k dl· ♡ 1895.3k dl♡ 189
- 🤗xverse/XVERSE-65Bmodel· 43 dl· ♡ 3843 dl♡ 38
- 🤗jbochi/madlad400-8b-lmmodel· 16 dl· ♡ 816 dl♡ 8
- 🤗google/madlad400-8b-lmmodel· 245 dl· ♡ 11245 dl♡ 11
- 🤗xverse/XVERSE-65B-2model· 23 dl· ♡ 1023 dl♡ 10
- 🤗norallm/normistral-7b-scratchmodel· 439 dl· ♡ 9439 dl♡ 9
Videos
PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?· youtube
8 Ways ChatGPT 4 [Is] Better Than ChatGPT· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Parallel Layers · Adafactor · SentencePiece · Rotary Position Embedding · SwiGLU · Multi-Query Attention · Pathways Language Model · Dropout · Layer Normalization
