Markov-Enhanced Clustering for Long Document Summarization: Tackling the 'Lost in the Middle' Challenge with Large Language Models
Aziz Amari (1), Mohamed Achref Ben Ammar (1) ((1) National Institute of Applied Science, Technology (INSAT), University of Carthage, Tunis, Tunisia)

TL;DR
This paper introduces a hybrid summarization method that combines extractive and abstractive techniques, using clustering and Markov chains to improve the coherence and key information retention in long document summaries.
Contribution
The paper presents a novel hybrid approach that enhances long document summarization by integrating clustering, Markov chains, and combined extractive-abstractive methods.
Findings
Improved retention of key information in long summaries
Enhanced coherence through Markov chain-based idea sequencing
Effective handling of lengthy documents with hybrid techniques
Abstract
The rapid expansion of information from diverse sources has heightened the need for effective automatic text summarization, which condenses documents into shorter, coherent texts. Summarization methods generally fall into two categories: extractive, which selects key segments from the original text, and abstractive, which generates summaries by rephrasing the content coherently. Large language models have advanced the field of abstractive summarization, but they are resourceintensive and face significant challenges in retaining key information across lengthy documents, which we call being "lost in the middle". To address these issues, we propose a hybrid summarization approach that combines extractive and abstractive techniques. Our method splits the document into smaller text chunks, clusters their vector embeddings, generates a summary for each cluster that represents a key idea in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
