Context-Aware Hierarchical Merging for Long Document Summarization

Litu Ou; Mirella Lapata

arXiv:2502.00977·cs.CL·August 11, 2025

Context-Aware Hierarchical Merging for Long Document Summarization

Litu Ou, Mirella Lapata

PDF

Open Access

TL;DR

This paper enhances hierarchical merging for long document summarization by integrating source context to reduce hallucinations and improve factual accuracy, demonstrating superior results on legal and narrative datasets.

Contribution

It introduces context augmentation techniques—replacement, refinement, and alignment—to improve the factual reliability of hierarchical merging in long document summarization.

Findings

01

Contextual augmentation outperforms baseline methods.

02

Refinement works best with extractive summarization.

03

Methods reduce hallucinations and improve factual accuracy.

Abstract

Hierarchical Merging is a technique commonly used to summarize very long texts ( $>$ 100K tokens) by breaking down the input into smaller sections, summarizing those sections individually, and then merging or combining those summaries into a final coherent summary. Although it helps address the limitations of large language models (LLMs) with fixed input length constraints, the recursive merging process can amplify LLM hallucinations, increasing the risk of factual inaccuracies. In this paper, we seek to mitigate hallucinations by enriching hierarchical merging with context from the source document. Specifically, we propose different approaches to contextual augmentation ranging from \emph{replacing} intermediate summaries with relevant input context, to \emph{refining} them while using the context as supporting evidence, and \emph{aligning} them implicitly (via citations) to the input.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Topic Modeling · Data Quality and Management