GLM: General Language Model Pretraining with Autoregressive Blank   Infilling

Zhengxiao Du; Yujie Qian; Xiao Liu; Ming Ding; Jiezhong Qiu; Zhilin; Yang; Jie Tang

arXiv:2103.10360·cs.CL·March 18, 2022·21 cites

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin, Yang, Jie Tang

PDF

Open Access 5 Repos 10 Models

TL;DR

The paper introduces GLM, a versatile pretraining framework using autoregressive blank infilling with 2D positional encodings, achieving superior performance across various NLP tasks compared to existing models like BERT, T5, and GPT.

Contribution

GLM is a novel autoregressive blank infilling model that unifies multiple pretraining objectives, improving performance across NLU, unconditional, and conditional generation tasks.

Findings

01

GLM outperforms BERT, T5, and GPT on multiple NLP benchmarks.

02

GLM achieves best results with 1.25x parameters of BERT Large.

03

GLM demonstrates strong generalizability across diverse NLP tasks.

Abstract

There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks. On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsGated Linear Unit · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · GPT · Inverse Square Root Schedule · Byte Pair Encoding · SentencePiece · Adafactor