BART: Denoising Sequence-to-Sequence Pre-training for Natural Language   Generation, Translation, and Comprehension

Mike Lewis; Yinhan Liu; Naman Goyal; Marjan Ghazvininejad; Abdelrahman; Mohamed; Omer Levy; Ves Stoyanov; Luke Zettlemoyer

arXiv:1910.13461·cs.CL·October 31, 2019

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman, Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

PDF

5 Repos 10 Models 5 Datasets

TL;DR

BART is a versatile denoising autoencoder pretraining method for sequence-to-sequence models that excels in text generation and comprehension tasks, achieving state-of-the-art results across multiple NLP benchmarks.

Contribution

Introduces BART, a flexible pretraining framework combining noising strategies and a Transformer architecture, unifying and extending prior models like BERT and GPT.

Findings

01

Achieves state-of-the-art results on summarization, question answering, and dialogue tasks.

02

Matches RoBERTa performance on GLUE and SQuAD with similar resources.

03

Provides significant BLEU improvements in machine translation.

Abstract

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDenoising Autoencoder · Solana Customer Service Number +1-833-534-1729 · Residual Connection · BART