Summarization from Leaderboards to Practice: Choosing A Representation Backbone and Ensuring Robustness
David Demeter, Oshin Agarwal, Simon Ben Igeri, Marko Sterbentz, Neil, Molino, John M. Conroy, Ani Nenkova

TL;DR
This paper analyzes how to select and improve summarization system components, emphasizing the importance of backbone choice, cross-domain robustness, and the need for heterogeneous benchmarks, with BART outperforming other models.
Contribution
It provides empirical insights into backbone selection, cross-domain performance, and highlights the necessity for diverse benchmarks in summarization systems.
Findings
BART outperforms PEGASUS and T5 in evaluations.
Cross-domain summarization performance drops significantly.
Heterogeneous domain training improves robustness across domains.
Abstract
Academic literature does not give much guidance on how to build the best possible customer-facing summarization system from existing research components. Here we present analyses to inform the selection of a system backbone from popular models; we find that in both automatic and human evaluation, BART performs better than PEGASUS and T5. We also find that when applied cross-domain, summarizers exhibit considerably worse performance. At the same time, a system fine-tuned on heterogeneous domains performs well on all domains and will be most suitable for a broad-domain summarizer. Our work highlights the need for heterogeneous domain summarization benchmarks. We find considerable variation in system output that can be captured only with human evaluation and are thus unlikely to be reflected in standard leaderboards with only automatic evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Dense Connections · Linear Layer · SentencePiece · Layer Normalization · Multi-Head Attention · Adam
