ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen,, Ruofei Zhang, Ming Zhou

TL;DR
ProphetNet is a novel sequence-to-sequence pre-training model that predicts multiple future tokens simultaneously, leading to improved performance on summarization and question generation tasks.
Contribution
It introduces future n-gram prediction and n-stream self-attention, enabling the model to plan ahead and outperform previous models on key benchmarks.
Findings
Achieves state-of-the-art results on CNN/DailyMail, Gigaword, and SQuAD 1.1.
Outperforms models with the same pre-training data scale.
Demonstrates the effectiveness of future n-gram prediction in sequence modeling.
Abstract
This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of optimizing one-step-ahead prediction in the traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction that predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large-scale dataset (160GB), respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/prophetnet-large-uncased-cnndmmodel· 510 dl· ♡ 2510 dl♡ 2
- 🤗microsoft/prophetnet-large-uncased-squad-qgmodel· 508 dl· ♡ 7508 dl♡ 7
- 🤗microsoft/prophetnet-large-uncasedmodel· 129k dl· ♡ 6129k dl♡ 6
- 🤗microsoft/xprophetnet-large-wiki100-cased-xglue-ntgmodel· 37 dl37 dl
- 🤗microsoft/xprophetnet-large-wiki100-cased-xglue-qgmodel· 15 dl15 dl
- 🤗microsoft/xprophetnet-large-wiki100-casedmodel· 101 dl· ♡ 2101 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsProphetNet
