Mitigating Data Scarceness through Data Synthesis, Augmentation and   Curriculum for Abstractive Summarization

Ahmed Magooda; Diane Litman

arXiv:2109.08569·cs.CL·September 20, 2021

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

Ahmed Magooda, Diane Litman

PDF

TL;DR

This paper presents three data manipulation techniques—synthesis, augmentation, and curriculum learning—to enhance abstractive summarization models without extra data, demonstrating their effectiveness across different models and datasets.

Contribution

It introduces novel data synthesis, augmentation, and curriculum strategies with new difficulty metrics, improving summarization performance without additional data.

Findings

01

Techniques improve summarization across models and datasets

02

Combining methods yields better results than individual use

03

Methods are effective even with small datasets

Abstract

This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.