PanGu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin, Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, Chen Li, Ziyan Gong,, Yifan Yao, Xinjing Huang, Jun Wang, Jianfeng Yu, Qi Guo, Yue Yu, Yan Zhang,, Jin Wang, Hengtao Tao, Dasen Yan, Zexuan Yi, Fang Peng

TL;DR
This paper introduces PanGu-$\alpha$, a large-scale autoregressive Chinese language model with 200 billion parameters, trained on extensive data and optimized with advanced parallelism techniques, demonstrating strong few-shot and zero-shot NLP performance.
Contribution
The paper presents the development and training of PanGu-$\alpha$, a 200-billion-parameter Chinese language model utilizing a novel auto-parallel training strategy on a large AI cluster.
Findings
PanGu-$\alpha$ achieves superior few-shot and zero-shot NLP task performance.
Efficient training of 200-billion-parameter model using MindSpore auto-parallelism.
Model generalizes well across diverse Chinese NLP tasks.
Abstract
Large-scale Pretrained Language Models (PLMs) have become the new paradigm for Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as GPT-3 have demonstrated strong performances on natural language understanding and generation with \textit{few-shot in-context} learning. In this work, we present our practice on training large-scale autoregressive language models named PanGu-, with up to 200 billion parameters. PanGu- is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
[ML News] EU regulates AI, China trains 1.75T model, Google's oopsie, Everybody cheers for fraud.· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · PanGu-$α$ · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Byte Pair Encoding
