PALM: Pre-training an Autoencoding&Autoregressive Language Model for   Context-conditioned Generation

Bin Bi; Chenliang Li; Chen Wu; Ming Yan; Wei Wang; Songfang Huang; Fei; Huang; Luo Si

arXiv:2004.07159·cs.CL·September 22, 2020·20 cites

PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang, Songfang Huang, Fei, Huang, Luo Si

PDF

Open Access 2 Repos

TL;DR

PALM introduces a novel pre-training scheme combining autoencoding and autoregressive objectives, specifically designed to improve context-conditioned text generation tasks, achieving state-of-the-art results across multiple benchmarks.

Contribution

The paper proposes a new pre-training approach that jointly optimizes autoencoding and autoregressive objectives to better align with language generation tasks.

Findings

01

Achieves state-of-the-art on MARCO question answering

02

Sets new records in CNN/DailyMail summarization

03

Outperforms previous models in question generation and conversational response tasks

Abstract

Self-supervised pre-training, such as BERT, MASS and BART, has emerged as a powerful technique for natural language understanding and generation. Existing pre-training techniques employ autoencoding and/or autoregressive objectives to train Transformer-based models by recovering original word tokens from corrupted text with some masked tokens. The training goals of existing techniques are often inconsistent with the goals of many language generation tasks, such as generative question answering and conversational response generation, for producing new text given context. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on context. The new scheme alleviates the mismatch introduced by the existing denoising scheme between pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Weight Decay · Residual Connection · Adam · Byte Pair Encoding · Layer Normalization · Softmax · Attention Is All You Need · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?