InheritSumm: A General, Versatile and Compact Summarizer by Distilling   from GPT

Yichong Xu; Ruochen Xu; Dan Iter; Yang Liu; Shuohang Wang; Chenguang; Zhu; Michael Zeng

arXiv:2305.13083·cs.CL·May 23, 2023·1 cites

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

Yichong Xu, Ruochen Xu, Dan Iter, Yang Liu, Shuohang Wang, Chenguang, Zhu, Michael Zeng

PDF

Open Access

TL;DR

InheritSumm is a distilled, compact summarization model derived from GPT-3.5 that achieves comparable or superior performance to GPT-3.5 in zero-shot, few-shot, and fine-tuning settings, offering a cost-effective alternative.

Contribution

The paper introduces InheritSumm, a versatile summarizer distilled from GPT-3.5, combining high performance with compactness for practical applications.

Findings

01

InheritSumm matches GPT-3.5 in zero-shot and few-shot summarization.

02

It outperforms previous small models in fine-tuning scenarios.

03

InheritSumm is suitable for both prefix-tuning and full-data fine-tuning.

Abstract

While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications. Conversely, previous studies have found that although automatic metrics tend to favor smaller fine-tuned models, the quality of the summaries they generate is inferior to that of larger models like GPT-3 when assessed by human evaluators. To address this issue, we propose InheritSumm, a versatile and compact summarization model derived from GPT-3.5 through distillation. InheritSumm not only exhibits comparable zeroshot and fewshot summarization capabilities to GPT-3.5 but is also sufficiently compact for fine-tuning purposes. Experimental results demonstrate that InheritSumm achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings. Furthermore, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Algorithms and Data Compression · Machine Learning and Data Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer