GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding,, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei, Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang

TL;DR
GLM-130B is a large bilingual pre-trained language model that outperforms some of the largest models in English and Chinese benchmarks, with efficient training and inference strategies, and is openly accessible.
Contribution
This paper introduces GLM-130B, a 130-billion-parameter bilingual model with novel training strategies, stability solutions, and efficient quantization enabling affordable inference.
Findings
Outperforms GPT-3 175B on English benchmarks
Significantly outperforms Chinese model ERNIE TITAN 3.0 260B
Achieves effective INT4 quantization with minimal performance loss
Abstract
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks while the performance advantage is not observed in OPT-175B and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0 260B -- the largest Chinese language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zai-org/chatglm-6bmodel· 2.7k dl· ♡ 28762.7k dl♡ 2876
- 🤗lykeven/uptestmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗NewBreaker/ChatGLM-6Bmodel· 2 dl2 dl
- 🤗ljsabc/ChatGLM-prefix-tuningmodel· 17 dl· ♡ 117 dl♡ 1
- 🤗zai-org/visualglm-6bmodel· 169 dl· ♡ 210169 dl♡ 210
- 🤗fengyan/chatglm-6Bmodel· 3 dl3 dl
- 🤗HasturOfficial/chatglm-6bmodel· 13 dl13 dl
- 🤗zai-org/chatglm2-6bmodel· 431k dl· ♡ 2057431k dl♡ 2057
- 🤗sharpbai/chatglm2-6bmodel· 12 dl· ♡ 112 dl♡ 1
- 🤗zai-org/chatglm2-6b-int4model· 714 dl· ♡ 237714 dl♡ 237
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Natural Language Processing Techniques
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · GLM · ERNIE · Linear Layer · Dropout · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Residual Connection
