SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a   Survey Section

Leandro Car\'isio Fernandes; Gustavo Bartz Guedes; Thiago Soares; Laitz; Thales Sales Almeida; Rodrigo Nogueira; Roberto Lotufo; Jayr Pereira

arXiv:2408.16444·cs.CL·March 18, 2025

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Leandro Car\'isio Fernandes, Gustavo Bartz Guedes, Thiago Soares, Laitz, Thales Sales Almeida, Rodrigo Nogueira, Roberto Lotufo, Jayr Pereira

PDF

Open Access 1 Datasets

TL;DR

SurveySum is a new dataset and method for summarizing multiple scientific articles into survey sections, emphasizing retrieval quality and configuration impacts for improved domain-specific summarization.

Contribution

Introduces SurveySum dataset, two summarization pipelines for scientific articles, and evaluates their performance with multiple metrics.

Findings

01

High-quality retrieval improves summary quality

02

Configuration choices significantly affect results

03

Evaluation highlights importance of retrieval stage

Abstract

Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

unicamp-dl/SurveySum
dataset· 14 dl
14 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods