ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM: Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan, Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning, Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang,, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu

TL;DR
ChatGLM is a family of large language models, with the GLM-4 series achieving performance comparable to or surpassing GPT-4 across various benchmarks, and includes tools for complex tasks like web browsing and code execution.
Contribution
This paper introduces the GLM-4 series of large language models, demonstrating state-of-the-art performance and multi-tool capabilities, with extensive open-source releases for community use.
Findings
GLM-4 models outperform GPT-4 on multiple benchmarks.
GLM-4 achieves near GPT-4-Turbo in instruction following.
Open-source models attract over 10 million downloads in 2023.
Abstract
We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zai-org/glm-4-9b-chatmodel· 165k dl· ♡ 704165k dl♡ 704
- 🤗zai-org/glm-4-9bmodel· 8.0k dl· ♡ 1448.0k dl♡ 144
- 🤗zai-org/glm-4-9b-chat-1mmodel· 13k dl· ♡ 20113k dl♡ 201
- 🤗zai-org/GLM-Z1-9B-0414model· 2.1k dl· ♡ 862.1k dl♡ 86
- 🤗zai-org/chatglm-6bmodel· 2.7k dl· ♡ 28762.7k dl♡ 2876
- 🤗zai-org/chatglm-6b-int4model· 869 dl· ♡ 416869 dl♡ 416
- 🤗zai-org/chatglm-6b-int8model· 61 dl· ♡ 7061 dl♡ 70
- 🤗zai-org/visualglm-6bmodel· 169 dl· ♡ 210169 dl♡ 210
- 🤗zai-org/chatglm2-6bmodel· 431k dl· ♡ 2057431k dl♡ 2057
- 🤗zai-org/chatglm2-6b-int4model· 714 dl· ♡ 237714 dl♡ 237
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention
