Training Dynamics for Text Summarization Models
Tanya Goyal, Jiacheng Xu, Junyi Jessy Li, Greg Durrett

TL;DR
This paper investigates the training dynamics of text summarization models, revealing how different knowledge and behaviors are learned at various stages, and proposes training modifications to enhance factuality and abstractiveness.
Contribution
It provides a detailed analysis of how summarization models learn copying and factual behaviors during fine-tuning, and introduces simple training adjustments to optimize specific model qualities.
Findings
Copying behavior is learned early in training across datasets.
Factual errors like hallucinations are learned later and vary across domains.
Training modifications can improve factuality or abstractiveness.
Abstract
Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training time or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that a propensity to copy the input is learned early in the training process consistently across all datasets studied. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, though this behavior is more varied across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
