ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and   Effective Text Generation

Junyi Li; Tianyi Tang; Wayne Xin Zhao; Jian-Yun Nie; Ji-Rong Wen

arXiv:2210.13304·cs.CL·October 31, 2022

ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation

Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen

PDF

Open Access 1 Repo 1 Models

TL;DR

ELMER is a pre-trained language model designed for non-autoregressive text generation, achieving high efficiency and quality by modeling token dependencies and using an innovative pre-training method, significantly improving inference speed and performance.

Contribution

ELMER introduces explicit token dependency modeling and a novel Layer Permutation Language Modeling pre-training objective for non-autoregressive text generation.

Findings

01

ELMER outperforms existing NAR models in quality.

02

ELMER narrows the performance gap with AR models.

03

ELMER achieves over 10 times faster inference speed.

Abstract

We study the text generation task under the approach of pre-trained language models (PLMs). Typically, an auto-regressive (AR) method is adopted for generating texts in a token-by-token manner. Despite many advantages of AR generation, it usually suffers from inefficient inference. Therefore, non-autoregressive (NAR) models are proposed to generate all target tokens simultaneously. However, NAR models usually generate texts of lower quality due to the absence of token dependency in the output text. In this paper, we propose ELMER: an efficient and effective PLM for NAR text generation to explicitly model the token dependency during NAR generation. By leveraging the early exit technique, ELMER enables the token generations at different layers, according to their prediction confidence (a more confident token will exit at a lower layer). Besides, we propose a novel pre-training objective,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rucaibox/elmer
pytorchOfficial

Models

🤗
RUCAIBox/elmer
model· 36 dl· ♡ 4
36 dl♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Layer Normalization · Byte Pair Encoding · Residual Connection · Dropout