SKT5SciSumm -- Revisiting Extractive-Generative Approach for Multi-Document Scientific Summarization
Huy Quoc To, Ming Liu, Guangyan Huang, Hung-Nghiep Tran, Andr'e, Greiner-Petter, Felix Beierle, Akiko Aizawa

TL;DR
This paper introduces SKT5SciSumm, a hybrid multi-document scientific summarization framework that combines sentence embeddings and T5 models, achieving state-of-the-art results efficiently.
Contribution
The paper presents a novel hybrid approach using citation-informed embeddings and T5 models for effective extractive and abstractive scientific summarization.
Findings
Achieves state-of-the-art performance on Multi-XScience dataset
Uses less complex models to obtain remarkable results
Demonstrates the effectiveness of combining embeddings with T5 for scientific summarization
Abstract
Summarization for scientific text has shown significant benefits both for the research community and human society. Given the fact that the nature of scientific text is distinctive and the input of the multi-document summarization task is substantially long, the task requires sufficient embedding generation and text truncation without losing important information. To tackle these issues, in this paper, we propose SKT5SciSumm - a hybrid framework for multi-document scientific summarization (MDSS). We leverage the Sentence-Transformer version of Scientific Paper Embeddings using Citation-Informed Transformers (SPECTER) to encode and represent textual sentences, allowing for efficient extractive summarization using k-means clustering. We employ the T5 family of models to generate abstractive summaries using extracted sentences. SKT5SciSumm achieves state-of-the-art performance on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Advanced Text Analysis Techniques · Data Quality and Management
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Byte Pair Encoding · Inverse Square Root Schedule · Dropout · Attention Dropout · Dense Connections · SentencePiece · Attention Is All You Need
