Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization
Pierre Achkar, Tim Gollub, Martin Potthast

TL;DR
This paper introduces XSum, a modular pipeline leveraging retrieval-augmented generation for effective multi-document summarization of scientific literature, improving accuracy and coherence in automated summaries.
Contribution
It presents a novel, adaptable framework combining question generation and content editing modules for scientific literature summarization using RAG techniques.
Findings
XSum outperforms existing methods on SurveySum dataset.
The pipeline achieves higher scores on CheckEval, G-Eval, and Ref-F1 metrics.
The approach enhances the quality and reliability of automated scientific summaries.
Abstract
The exponential growth of scientific publications has made it increasingly difficult for researchers to stay updated and synthesize knowledge effectively. This paper presents XSum, a modular pipeline for multi-document summarization (MDS) in the scientific domain using Retrieval-Augmented Generation (RAG). The pipeline includes two core components: a question-generation module and an editor module. The question-generation module dynamically generates questions adapted to the input papers, ensuring the retrieval of relevant and accurate information. The editor module synthesizes the retrieved content into coherent and well-structured summaries that adhere to academic standards for proper citation. Evaluated on the SurveySum dataset, XSum demonstrates strong performance, achieving considerable improvements in metrics such as CheckEval, G-Eval and Ref-F1 compared to existing approaches.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques
