Divide and summarize: improve SLM text summarization
Alexandre Bailly, Antoine Saubin, Gabriel Kocevar, Jonathan Bodin

TL;DR
This paper compares two text summarization methods for small language models, finding that the 'Map' method improves accuracy and avoids losing information from the middle of texts.
Contribution
The study introduces and validates the 'Map' method as a superior alternative to the 'Stuff' method for SLM-based summarization.
Findings
The Map method retains key facts from the beginning and middle of texts better than the Stuff method.
SLMs using the Map method achieved performance comparable to LLMs using the Stuff method.
The Map method effectively addresses the 'Lost in the Middle' problem in SLM summarization.
Abstract
Text summarization is a longstanding challenge in natural language processing, with recent advancements driven by the adoption of Large Language Models (LLMs) and Small Language Models (SLMs). Despite these developments, issues such as the “Lost in the Middle” problem—where LLMs tend to overlook information in the middle of lengthy prompts—persist. Traditional summarization, often termed the “Stuff” method, processes an entire text in a single pass. In contrast, the “Map” method divides the text into segments, summarizes each independently, and then synthesizes these partial summaries into a final output, potentially mitigating the “Lost in the Middle” issue. This study investigates whether the Map method outperforms the Stuff method for texts that fit within the context window of SLMs and assesses its effectiveness in addressing the “Lost in the Middle” problem. We conducted a…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
