BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman, Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

TL;DR
BART is a versatile denoising autoencoder pretraining method for sequence-to-sequence models that excels in text generation and comprehension tasks, achieving state-of-the-art results across multiple NLP benchmarks.
Contribution
Introduces BART, a flexible pretraining framework combining noising strategies and a Transformer architecture, unifying and extending prior models like BERT and GPT.
Findings
Achieves state-of-the-art results on summarization, question answering, and dialogue tasks.
Matches RoBERTa performance on GLUE and SQuAD with similar resources.
Provides significant BLEU improvements in machine translation.
Abstract
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/bart-large-mnlimodel· 3.2M dl· ♡ 15533.2M dl♡ 1553
- 🤗facebook/bart-large-cnnmodel· 2.0M dl· ♡ 15532.0M dl♡ 1553
- 🤗KBLab/bart-base-swedish-casedmodel· 2.0k dl· ♡ 32.0k dl♡ 3
- 🤗aware-ai/bart-squadv2model· 20 dl· ♡ 120 dl♡ 1
- 🤗apol/dalle-minimodel· 14 dl· ♡ 914 dl♡ 9
- 🤗ccdv/lsg-bart-base-4096model· 12 dl· ♡ 312 dl♡ 3
- 🤗ccdv/lsg-bart-large-4096model· 8 dl8 dl
- 🤗dalle-mini/dalle-minimodel· 192 dl· ♡ 396192 dl♡ 396
- 🤗eugenesiow/bart-paraphrasemodel· 3.6k dl· ♡ 313.6k dl♡ 31
- 🤗facebook/bart-basemodel· 1.2M dl· ♡ 2041.2M dl♡ 204
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDenoising Autoencoder · Solana Customer Service Number +1-833-534-1729 · Residual Connection · BART
