Pre-training via Paraphrasing
Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida, Wang, Luke Zettlemoyer

TL;DR
MARGE is a novel unsupervised pre-training method for sequence-to-sequence models that leverages multi-lingual multi-document paraphrasing to improve performance across various NLP tasks without task-specific training.
Contribution
It introduces MARGE, a pre-training approach that jointly learns retrieval and reconstruction, capturing paraphrasing, translation, and summarization for strong zero-shot and fine-tuned performance.
Findings
Achieves BLEU scores up to 35.8 in document translation without task-specific training.
Demonstrates strong zero-shot performance on multiple NLP tasks.
Fine-tuning further enhances performance across discriminative and generative tasks.
Abstract
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization. The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks. For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation. We further show that fine-tuning gives strong performance on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
