SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section
Leandro Car\'isio Fernandes, Gustavo Bartz Guedes, Thiago Soares, Laitz, Thales Sales Almeida, Rodrigo Nogueira, Roberto Lotufo, Jayr Pereira

TL;DR
SurveySum is a new dataset and method for summarizing multiple scientific articles into survey sections, emphasizing retrieval quality and configuration impacts for improved domain-specific summarization.
Contribution
Introduces SurveySum dataset, two summarization pipelines for scientific articles, and evaluates their performance with multiple metrics.
Findings
High-quality retrieval improves summary quality
Configuration choices significantly affect results
Evaluation highlights importance of retrieval stage
Abstract
Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
