Does Pretraining for Summarization Require Knowledge Transfer?

Kundan Krishna; Jeffrey Bigham; Zachary C. Lipton

arXiv:2109.04953·cs.CL·September 13, 2021

Does Pretraining for Summarization Require Knowledge Transfer?

Kundan Krishna, Jeffrey Bigham, Zachary C. Lipton

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether pretraining for text summarization truly relies on knowledge transfer from large datasets, finding that random character n-gram pretraining nearly matches real data performance, questioning the necessity of large corpora.

Contribution

The study challenges the common belief that knowledge transfer from large datasets is essential for effective summarization pretraining, showing that simpler pretraining tasks can achieve similar results.

Findings

01

Random character n-gram pretraining nearly matches real corpus performance

02

Pretraining tasks inspired by summarization data structure do not significantly improve results

03

Eliminating large datasets could reduce concerns over bias and copyright issues

Abstract

Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining's benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

acmi-lab/pretraining-with-nonsense
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research