Unified Language Model Pre-training for Natural Language Understanding   and Generation

Li Dong; Nan Yang; Wenhui Wang; Furu Wei; Xiaodong Liu; Yu Wang,; Jianfeng Gao; Ming Zhou; Hsiao-Wuen Hon

arXiv:1905.03197·cs.CL·October 16, 2019·949 cites

Unified Language Model Pre-training for Natural Language Understanding and Generation

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang,, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

PDF

Open Access 5 Repos

TL;DR

This paper introduces UniLM, a unified pre-trained language model capable of excelling in both understanding and generation tasks through a shared Transformer architecture and specialized self-attention mechanisms.

Contribution

The paper proposes a novel unified pre-training approach combining different language modeling tasks within a single model, enhancing performance across diverse NLP tasks.

Findings

01

Outperforms BERT on GLUE benchmark

02

Achieves state-of-the-art results on five NLP generation datasets

03

Significantly improves summarization and question answering metrics

Abstract

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UniLM achieves new state-of-the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections