CPM: A Large-scale Generative Chinese Pre-trained Language Model
Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia, Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng,, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu,, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li

TL;DR
This paper introduces CPM, a large-scale Chinese pre-trained language model with 2.6 billion parameters trained on 100GB of data, demonstrating strong performance on various Chinese NLP tasks, especially in few-shot and zero-shot settings.
Contribution
The paper presents CPM, the largest Chinese pre-trained language model to date, with generative pre-training on extensive Chinese data, enabling improved NLP task performance.
Findings
CPM achieves state-of-the-art results on multiple Chinese NLP tasks.
Effective in few-shot and zero-shot learning scenarios.
Provides publicly available code and model parameters.
Abstract
Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570GB training data, drew a lot of attention due to the capacity of few-shot (even zero-shot) learning. However, applying GPT-3 to address Chinese NLP tasks is still challenging, as the training corpus of GPT-3 is primarily English, and the parameters are not publicly available. In this technical report, we release the Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data. To the best of our knowledge, CPM, with 2.6 billion parameters and 100GB Chinese training data, is the largest Chinese pre-trained language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation, cloze test, and language understanding. Extensive experiments demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Cosine Annealing · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Linear Warmup With Cosine Annealing · 15 Ways to Contact How can i speak to someone at Delta Airlines · Residual Connection · Attention Dropout · Weight Decay
