GLM-130B: An Open Bilingual Pre-trained Model

Aohan Zeng; Xiao Liu; Zhengxiao Du; Zihan Wang; Hanyu Lai; Ming Ding,; Zhuoyi Yang; Yifan Xu; Wendi Zheng; Xiao Xia; Weng Lam Tam; Zixuan Ma; Yufei; Xue; Jidong Zhai; Wenguang Chen; Peng Zhang; Yuxiao Dong; Jie Tang

arXiv:2210.02414·cs.CL·October 26, 2023·295 cites

GLM-130B: An Open Bilingual Pre-trained Model

Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding,, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei, Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang

PDF

Open Access 5 Repos 10 Models 1 Video

TL;DR

GLM-130B is a large bilingual pre-trained language model that outperforms some of the largest models in English and Chinese benchmarks, with efficient training and inference strategies, and is openly accessible.

Contribution

This paper introduces GLM-130B, a 130-billion-parameter bilingual model with novel training strategies, stability solutions, and efficient quantization enabling affordable inference.

Findings

01

Outperforms GPT-3 175B on English benchmarks

02

Significantly outperforms Chinese model ERNIE TITAN 3.0 260B

03

Achieves effective INT4 quantization with minimal performance loss

Abstract

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks while the performance advantage is not observed in OPT-175B and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0 260B -- the largest Chinese language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

GLM-130B: An Open Bilingual Pre-trained Model· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Topic Modeling · Natural Language Processing Techniques

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · GLM · ERNIE · Linear Layer · Dropout · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Residual Connection