GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin, Yang, Jie Tang

TL;DR
The paper introduces GLM, a versatile pretraining framework using autoregressive blank infilling with 2D positional encodings, achieving superior performance across various NLP tasks compared to existing models like BERT, T5, and GPT.
Contribution
GLM is a novel autoregressive blank infilling model that unifies multiple pretraining objectives, improving performance across NLU, unconditional, and conditional generation tasks.
Findings
GLM outperforms BERT, T5, and GPT on multiple NLP benchmarks.
GLM achieves best results with 1.25x parameters of BERT Large.
GLM demonstrates strong generalizability across diverse NLP tasks.
Abstract
There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks. On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zai-org/chatglm-6bmodel· 2.7k dl· ♡ 28762.7k dl♡ 2876
- 🤗zai-org/glm-10b-chinesemodel· 14 dl· ♡ 12214 dl♡ 122
- 🤗zai-org/glm-10bmodel· 290 dl· ♡ 33290 dl♡ 33
- 🤗zai-org/glm-2bmodel· 36 dl· ♡ 1636 dl♡ 16
- 🤗zai-org/glm-large-chinesemodel· 137 dl· ♡ 34137 dl♡ 34
- 🤗zai-org/glm-roberta-largemodel· 17 dl· ♡ 517 dl♡ 5
- 🤗sunzeyeah/glm-10B-chinesemodel· 9 dl· ♡ 39 dl♡ 3
- 🤗sunzeyeah/glm-350M-chinesemodel· 12 dl· ♡ 312 dl♡ 3
- 🤗lykeven/uptestmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗NewBreaker/ChatGLM-6Bmodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsGated Linear Unit · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · GPT · Inverse Square Root Schedule · Byte Pair Encoding · SentencePiece · Adafactor
