Enhancing Abstractiveness of Summarization Models through Calibrated Distillation
Hwanjun Song, Igor Shalyminov, Hang Su, Siffi Singh, Kaisheng Yao,, Saab Mansour

TL;DR
This paper introduces DisCal, a novel distillation method that enhances the abstractiveness of summarization models without losing informativeness, by leveraging diverse pseudo summaries and ranking supervision.
Contribution
DisCal is a new approach that improves the abstractiveness of summarization models through dual supervision using pseudo summaries and ranking, outperforming prior distillation methods.
Findings
DisCal produces more abstractive summaries with higher n-gram overlap.
DisCal maintains or improves ROUGE scores compared to baseline methods.
DisCal outperforms previous distillation techniques in both abstractiveness and informativeness.
Abstract
Sequence-level knowledge distillation reduces the size of Seq2Seq models for more efficient abstractive summarization. However, it often leads to a loss of abstractiveness in summarization. In this paper, we propose a novel approach named DisCal to enhance the level of abstractiveness (measured by n-gram overlap) without sacrificing the informativeness (measured by ROUGE) of generated summaries. DisCal exposes diverse pseudo summaries with two supervision to the student model. Firstly, the best pseudo summary is identified in terms of abstractiveness and informativeness and used for sequence-level distillation. Secondly, their ranks are used to ensure the student model to assign higher prediction scores to summaries with higher ranks. Our experiments show that DisCal outperforms prior methods in abstractive summarization distillation, producing highly abstractive and informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Knowledge Distillation · Sequence to Sequence
