Text summarization via global structure awareness
Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Yibei Liu, Chenghao Li, Qigan Sun, Shuai Yuan, Fachrina Dewi Puspitasari, Dongshen Han, Guoqing Wang, Sung-Ho Bae, Yang Yang

TL;DR
This paper introduces GloSA-sum, a novel text summarization method that leverages topological data analysis to preserve global structure and coherence in long documents efficiently.
Contribution
GloSA-sum is the first summarization approach to incorporate global structure awareness via TDA, improving coherence and efficiency over existing methods.
Findings
Reduces redundancy in summaries
Preserves semantic and logical structures
Enhances downstream LLM tasks
Abstract
Text summarization is a fundamental task in natural language processing (NLP), and the information explosion has made long-document processing increasingly demanding, making summarization essential. Existing research mainly focuses on model improvements and sentence-level pruning, but often overlooks global structure, leading to disrupted coherence and weakened downstream performance. Some studies employ large language models (LLMs), which achieve higher accuracy but incur substantial resource and time costs. To address these issues, we introduce GloSA-sum, the first summarization approach that achieves global structure awareness via topological data analysis (TDA). GloSA-sum summarizes text efficiently while preserving semantic cores and logical dependencies. Specifically, we construct a semantic-weighted graph from sentence embeddings, where persistent homology identifies core…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper is among the first to employ topological data analysis for text summarization, explicitly modeling and preserving semantic clusters and logical dependencies. - Tables 2 and 3 provide a thorough analysis of computational complexity and runtime, demonstrating efficiency gains from the proposed one-time Protected Pool mechanism and hierarchical pipeline. The method achieves a favorable balance between computational cost and performance, which is particularly valuable for long-document s
1. **Unclear presentation and interpretation of main results:** The results in Table 1 demonstrate competitive performance, but do not clearly establish consistent advantages over strong baselines across all datasets and metrics. The improvements are often modest and inconsistently distributed across evaluation metrics. The paper would be strengthened by statistical significance testing to confirm that observed gains are reliable rather than artifacts of random variation. Moreover, the interpret
1. The paper’s primary strength is its novel use of Topological Data Analysis (TDA) to formally model a document's global structure. This allows the method to identify core semantic themes and logical connections in a principled way, moving beyond traditional local similarity graphs. 2. The framework is cleverly designed to be both highly effective and computationally efficient. The one-time TDA analysis and "Protected Pool" mechanism avoid costly repeated calculations, making the method scalabl
1. The paper posits that H1 cycles correspond to "logical loops" or "recurrent argumentative structures." While this is a compelling intuition, the connection is not explicitly demonstrated. The work would be significantly strengthened by a qualitative analysis that visualizes a few high-persistence H1 cycles from the data and shows the exact sentences that form them, explaining how they constitute a logical loop. Without this, the interpretation remains a plausible but unproven claim. 2. The en
- First work to apply TDA to summarization. - The Protected Pool + proxy scoring design avoids repeated TDA, this enables scalability - High QAFactEval scores suggest better factual consistency than many abstractive models.
- The paper says it is the first to bring TDA into summarization. That might be acceptable wording, but a lot of what is actually done after the TDA step looks like a graph based extractive summarizer with a protected set plus shortest-path based importance. - The ablation study removes the Protected Pool but does not compare against alternative global-structure-aware summarizers (e.g., graph-based methods with community detection, discourse parsers, or transformer-based long-range attention pr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Advanced Graph Neural Networks · Machine Learning in Healthcare
