GOSU: Retrieval-Augmented Generation with Global-Level Optimized Semantic Unit-Centric Framework
Xuecheng Zou, Ke Liu, Bingbing Wang, Huafei Deng, Li Zhang, Yu Tang

TL;DR
GOSU introduces a global-level semantic unit-centric framework that enhances retrieval-augmented generation by capturing interconnections across global context, improving generation quality over traditional RAG methods.
Contribution
The paper proposes GOSU, a novel framework that performs global disambiguation and captures interconnections between semantic units across text chunks, addressing limitations of local extraction methods.
Findings
GOSU outperforms baseline RAG methods in multiple tasks.
Hierarchical keyword extraction improves fine-grained relationship uncovering.
Semantic unit completion compensates for missing relationships.
Abstract
Building upon the standard graph-based Retrieval-Augmented Generation (RAG), the introduction of heterogeneous graphs and hypergraphs aims to enrich retrieval and generation by leveraging the relationships between multiple entities through the concept of semantic units (SUs). But this also raises a key issue: The extraction of high-level SUs limited to local text chunks is prone to ambiguity, complex coupling, and increased retrieval overhead due to the lack of global knowledge or the neglect of fine-grained relationships. To address these issues, we propose GOSU, a semantic unit-centric RAG framework that efficiently performs global disambiguation and utilizes SUs to capture interconnections between different nodes across the global context. In the graph construction phase, GOSU performs global merging on the pre-extracted SUs from local text chunks and guides entity and relationship…
| Agriculture | CS | Hypertension | Legal | Mix | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| NaiveRAG | GOSU | NaiveRAG | GOSU | NaiveRAG | GOSU | NaiveRAG | GOSU | NaiveRAG | GOSU | Avg gap | |
| Comprehensiveness | 12.0% | 88.0% | 14.3% | 85.7% | 11.1% | 88.9% | 12.0% | 88.0% | 23.1% | 76.9% | +71.0% |
| Diversity | 26.8% | 73.2% | 29.7% | 70.3% | 24.4% | 75.6% | 16.7% | 83.3% | 22.9% | 77.1% | +51.8% |
| Empowerment | 11.5% | 88.5% | 12.0% | 88.0% | 18.2% | 81.8% | 15.9% | 84.1% | 20.0% | 80.0% | +69.0% |
| Overall | 10.0% | 90.0% | 12.5% | 87.5% | 11.2% | 88.5% | 10.0% | 90.0% | 22.7% | 77.3% | +73.4% |
| LightRAG | GOSU | LightRAG | GOSU | LightRAG | GOSU | LightRAG | GOSU | LightRAG | GOSU | Avg gap | |
| Comprehensiveness | 8.8% | 91.2% | 20.0% | 80.0% | 25.0% | 75.0% | 29.4% | 70.6% | 41.7% | 58.3% | +50.0% |
| Diversity | 23.6% | 76.4% | 43.5% | 56.5% | 39.0% | 61.0% | 47.9% | 52.1% | 48.8% | 51.2% | +18.9% |
| Empowerment | 6.8% | 93.2% | 23.1% | 76.9% | 29.5% | 70.5% | 35.3% | 64.7% | 29.6% | 70.4% | +50.3% |
| Overall | 5.8% | 94.2% | 20.8% | 79.2% | 22.2% | 77.8% | 34.5% | 65.5% | 31.2% | 68.8% | +54.2% |
| HiRAG | GOSU | HiRAG | GOSU | HiRAG | GOSU | HiRAG | GOSU | HiRAG | GOSU | Avg gap | |
| Comprehensiveness | 14.3% | 85.7% | 42.9% | 57.1% | 20.0% | 80.0% | 33.3% | 66.7% | 33.3% | 66.7% | +42.5% |
| Diversity | 36.8% | 63.2% | 46.1% | 53.9% | 47.3% | 52.7% | 48.9% | 51.1% | 48.2% | 51.8% | +9.1% |
| Empowerment | 32.0% | 68.0% | 39.3% | 60.7% | 34.1% | 65.9% | 43.3% | 56.7% | 48.0% | 52.0% | +21.3% |
| Overall | 23.8% | 76.2% | 39.1% | 60.9% | 29.0% | 71.0% | 41.3% | 58.7% | 45.5% | 54.5% | +28.5% |
| HyperGraphRAG | GOSU | HyperGraphRAG | GOSU | HyperGraphRAG | GOSU | HyperGraphRAG | GOSU | HyperGraphRAG | GOSU | Avg gap | |
| Comprehensiveness | 5.9% | 94.1% | 3.7% | 96.3% | 6.5% | 93.5% | 5.6% | 94.4% | 7.7% | 92.3% | +88.2% |
| Diversity | 26.2% | 73.8% | 14.6% | 85.4% | 29.8% | 70.2% | 17.9% | 82.1% | 21.5% | 78.5% | +56.0% |
| Empowerment | 7.9% | 92.1% | 12.0% | 88.0% | 27.5% | 72.5% | 21.4% | 78.6% | 5.0% | 95.0% | +70.5% |
| Overall | 3.4% | 96.6% | 12.5% | 87.5% | 20.8% | 79.2% | 7.9% | 83.3% | 5.6% | 94.4% | +78.2% |
| Construction | |||||||||||
| Agriculture | CS | Hypertension | Legal | Mix | |||||||
| w/o GO | GOSU | w/o GO | GOSU | w/o GO | GOSU | w/o GO | GOSU | w/o GO | GOSU | Avg gap | |
| Comprehensiveness | 42.9% | 57.1% | 41.7% | 58.3% | 25.0% | 75.0% | 14.3% | 85.7% | 20.0% | 80.0% | +42.4% |
| Diversity | 46.7% | 53.3% | 48.7% | 51.3% | 48.1% | 51.9% | 40.3% | 59.7% | 49.4% | 50.6% | +6.7% |
| Empowerment | 46.4% | 53.6% | 42.1% | 57.9% | 42.9% | 57.1% | 22.7% | 77.3% | 40.9% | 59.1% | +22.0% |
| Overall | 41.7% | 58.3% | 41.2% | 58.8% | 32.4% | 67.6% | 15.8% | 84.2% | 37.5% | 62.5% | +32.6% |
| Retrieval & Generation | |||||||||||
| Agriculture | CS | Hypertension | Legal | Mix | |||||||
| w/o EL | GOSU | w/o EL | GOSU | w/o EL | GOSU | w/o EL | GOSU | w/o EL | GOSU | Avg gap | |
| Comprehensiveness | 46.7% | 53.3% | 38.5% | 61.5% | 47.1% | 52.9% | 37.5% | 62.5% | 46.2% | 53.8% | +13.6% |
| Diversity | 45.4% | 54.6% | 41.6% | 58.4% | 48.3% | 51.7% | 45.5% | 54.5% | 43.5% | 56.5% | +10.3% |
| Empowerment | 44.4% | 55.6% | 36.0% | 64.0% | 44.6% | 55.4% | 47.4% | 52.6% | 43.5% | 56.5% | +13.6% |
| Overall | 45.2% | 54.8% | 42.9% | 57.1% | 46.9% | 53.1% | 47.1% | 52.9% | 45.0% | 55.0% | +9.2% |
| w/o RL | GOSU | w/o RL | GOSU | w/o RL | GOSU | w/o RL | GOSU | w/o RL | GOSU | Avg gap | |
| Comprehensiveness | 45.5% | 54.5% | 44.4% | 55.6% | 33.3% | 66.7% | 36.4% | 63.6% | 47.1% | 52.9% | +17.3% |
| Diversity | 44.9% | 55.1% | 46.3% | 53.7% | 40.5% | 59.5% | 45.8% | 54.2% | 39.2% | 60.8% | +13.3% |
| Empowerment | 45.7% | 54.3% | 40.0% | 60.0% | 37.3% | 62.7% | 48.0% | 52.0% | 41.2% | 58.8% | +15.1% |
| Overall | 45.8% | 54.2% | 46.7% | 53.3% | 35.1% | 64.9% | 45.0% | 55.0% | 40.9% | 59.1% | +14.6% |
| w/o EL & RL | GOSU | w/o EL & RL | GOSU | w/o EL & RL | GOSU | w/o EL & RL | GOSU | w/o EL & RL | GOSU | Avg gap | |
| Comprehensiveness | 33.3% | 66.7% | 35.7% | 64.3% | 31.2% | 68.8% | 43.7% | 56.3% | 36.8% | 63.2% | +27.7% |
| Diversity | 32.3% | 67.7% | 38.7% | 61.3% | 31.7% | 68.3% | 46.3% | 53.7% | 34.7% | 65.3% | +26.5% |
| Empowerment | 30.6% | 69.4% | 37.5% | 62.5% | 43.1% | 56.9% | 48.1% | 51.9% | 37.9% | 62.1% | +21.1% |
| Overall | 35.7% | 64.3% | 27.8% | 72.2% | 42.4% | 57.6% | 45.8% | 54.2% | 33.3% | 66.7% | +26.0% |
| w/o SL | GOSU | w/o SL | GOSU | w/o SL | GOSU | w/o SL | GOSU | w/o SL | GOSU | Avg gap | |
| Comprehensiveness | 45.5% | 54.5% | 22.2% | 77.8% | 35.3% | 64.7% | 42.9% | 57.1% | 18.2% | 81.8% | +34.4% |
| Diversity | 38.3% | 61.7% | 43.4% | 56.6% | 44.8% | 55.2% | 42.2% | 57.8% | 39.2% | 60.8% | +16.8% |
| Empowerment | 38.1% | 61.9% | 32.0% | 68.0% | 38.5% | 61.5% | 34.8% | 65.2% | 36.0% | 64.0% | +28.2% |
| Overall | 38.9% | 61.1% | 25.0% | 75.0% | 39.4% | 60.6% | 36.8% | 63.2% | 38.1% | 61.9% | +28.7% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Graph Neural Networks
GOSU: Retrieval-Augmented Generation with Global-Level Optimized Semantic Unit-Centric Framework
Xuecheng Zou Ke Liu **Bingbing Wang
Huafei Deng** Li Zhang Yu Tang
School of Future Science and Engineering, Soochow University
School of Mathematical Sciences, Soochow University
School of Information Technology (Smart Campus Education Center),
Suzhou Institute of Trade & Commerce
{xczouxczou, bbwangstat1, 20245258046, 20245258048}@stu.suda.edu.cn
[email protected] [email protected] Corresponding author: [email protected].
Abstract
Building upon the standard graph-based Retrieval-Augmented Generation (RAG), the introduction of heterogeneous graphs and hypergraphs aims to enrich retrieval and generation by leveraging the relationships between multiple entities through the concept of semantic units (SUs). But this also raises a key issue: The extraction of high-level SUs limited to local text chunks is prone to ambiguity, complex coupling, and increased retrieval overhead due to the lack of global knowledge or the neglect of fine-grained relationships. To address these issues, we propose GOSU, a semantic unit-centric RAG framework that efficiently performs global disambiguation and utilizes SUs to capture interconnections between different nodes across the global context. In the graph construction phase, GOSU performs global merging on the pre-extracted SUs from local text chunks and guides entity and relationship extraction, reducing the difficulty of coreference resolution while uncovering global semantic objects across text chunks. In the retrieval and generation phase, we introduce hierarchical keyword extraction and semantic unit completion. The former uncovers the fine-grained binary relationships overlooked by the latter, while the latter compensates for the coarse-grained -ary relationships missing from the former. Evaluation across multiple tasks demonstrates that GOSU outperforms the baseline RAG methods in terms of generation quality. Our code is available at https://github.com/xczouxczou/GOSU.
GOSU: Retrieval-Augmented Generation with Global-Level Optimized Semantic Unit-Centric Framework
** Xuecheng Zou Ke Liu Bingbing Wang**
Huafei Deng Li Zhang Yu Tang††thanks: Corresponding author: [email protected].
School of Future Science and Engineering, Soochow University
School of Mathematical Sciences, Soochow University
School of Information Technology (Smart Campus Education Center),
Suzhou Institute of Trade & Commerce
{xczouxczou, bbwangstat1, 20245258046, 20245258048}@stu.suda.edu.cn
[email protected] [email protected]
1 Introduction
With the explosion of data scale (Ouyang et al., 2022), the performance of large language model (LLM) is improving by leaps and bounds (OpenAI et al., 2023; Touvron et al., 2023; Mei et al., 2025), yet their finite parameters still lead to frequent hallucinations (Mallen et al., 2022; Min et al., 2023; Ji et al., 2022; Huang et al., 2023). To this end, Retrieval-Augmented Generation (RAG) (Lewis et al., 2020; Gao et al., 2023b; Fan et al., 2024; Hu et al., 2025; Asai et al., 2023), which integrates external knowledge sources to enhance factual consistency and generation accuracy (Sudhi et al., 2024; Es et al., 2024; Salemi and Zamani, 2024; Zhao et al., 2023; Tu et al., 2024; Tonmoy et al., 2024; Shrestha et al., 2024; Liu et al., 2023), has emerged as a promising solution. In standard RAG methods, the simple approach of processing fixed-length text chunks often fails to effectively capture direct or indirect relationships between entities, limiting its practicality in knowledge-intensive tasks (Pan et al., 2023; Luo et al., 2023; Wang et al., 2024b; Han et al., 2024; Wen et al., 2023).
Recently, graph-structured RAG methods have enhanced the ability of relational representation by incorporating knowledge graphs (Edge et al., 2024; Zhang et al., 2025a; Liang et al., 2025; Guo et al., 2024; Tian et al., 2024; Park et al., 2023; Jiménez Gutiérrez et al., 2024; He et al., 2024; Trajanoska et al., 2023; Sanmartin, 2024; Wang et al., 2024b; Rampášek et al., 2022), but they are constrained by the binary relations inherent in structuring natural language into graphs, preventing them from effectively modeling -ary relations among multiple entities and thus limiting their performance on complex reasoning tasks (Wen et al., 2016). Current studies are exploring the introduction of heterogeneous graphs and hypergraphs to tackle this issue (Xu et al., 2025b; Luo et al., 2025a; Huang et al., 2025; Wang et al., 2025a; Ma et al., 2025; Mei et al., 2025). However, as illustrated in Fig. 1, decomposing events within isolated text chunks (Xu et al., 2025b) and over-emphasizing -ary relations (Luo et al., 2025a) not only leads to information fragmentation and contextual discontinuity, but also neglects the precise representation of fine-grained relations and increases the complexity of information coupling. In other words, meeting this challenge requires optimizing the entire RAG pipeline—from knowledge graph construction through retrieval and generation—by integrating global context while balancing both coarse-grained and fine-grained relations.
To address these shortcomings, we propose the GOSU framework, a RAG approach that refines semantic unit extraction at the global level and drives the entire pipeline around these semantic units (SUs). GOSU optimizes SUs at the global level through a multi-round semantic unit global merging strategy to prevent the relation fragmentation that can arise from relying on individual text chunks. Specifically, we leverage the LLM’s advanced natural language processing capabilities to identify SUs for each text block, so as to avoid the loss of critical semantic information. These identified SUs serve as pre-SUs, laying the foundation for subsequent disambiguation and deduplication merging, and ensuring semantic consistency across different text blocks. Unlike traditional graph methods based on binary relations or hyperedges, GOSU focuses on SUs—using semantic unit-centric connections during knowledge graph construction and retrieval to uncover coarse-grained -ary relationships while preserving fine-grained binary relations among low-level entities, thus avoiding excessive information coupling.
Our contributions can be summarized as follows:
- •
**Global-Level Semantic Unit Optimization: **A semantic unit global merging strategy that leverages LLM is proposed to extract SUs from each text block and then performs global disambiguation, deduplication, and merging to ensure semantic consistency across chunks and avoid relationship fragmentation caused by local segmentation.
- •
**Semantic Unit-Centric Knowledge Graph Construction: **Diverging from traditional binary-relation or hyperedge approaches, we center the graph around SUs. This allows us to simultaneously capture coarse-grained -ary relations and preserve fine-grained binary relations among underlying entities, achieving a balanced representation that mitigates over-coupling of information.
- •
**Dual-Phase Retrieval-Augmented Generation Framework: **Hierarchical keyword extraction with SU completion in both retrieval and generation stages are integrated—where keyword extraction targets fine-grained entity/term retrieval and SU completion fills in coarse-grained multi-entity SUs. The synergistic fusion of these components significantly enhances contextual coverage and generation fidelity.
Experiments across multiple open knowledge intensive fields demonstrate that GOSU has superior performance in authenticity, comprehensiveness, diversity and empowerment (Guo et al., 2024; Qian et al., 2024), which validates that our framework provides an innovative idea for the global-level semantic unit–centric graph construction and retrieval generation paradigm, and highlights its promising potential for real-world applications.
2 Related work
2.1 Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) grounds large language model (LLM) outputs in external corpora retrieval, which aligns the generation with trusted knowledge and reducing hallucinations (Huang et al., 2023; Niu et al., 2024; Bai et al., 2024; Bang et al., 2025) in knowledge-intensive tasks (Gao et al., 2023a). Subsequent improvements introduced joint training of passage-generation (Izacard et al., 2022) and multidimensional adaptive trade-offs (Min et al., 2021) to refine both query formulation and result filtering. Self-RAG and other variants (Trivedi et al., 2022; Shao et al., 2023; Asai et al., 2023; Yu et al., 2024) leverage the LLM itself within iterative retrieval–reasoning loops to generate follow-up queries and assess retrieved evidence, whereas DRAG (Hu et al., 2025) and FLARE (Jiang et al., 2023; Su et al., 2024) introduce multi-agent debates and token-triggered retrieval to further curb hallucination and reduce unnecessary context. Despite these advances, all of these "flat" RAG approaches rely on coarse chunking and simple retrieval strategies, which tend to fragment context and inject noise when handling nuanced multi-entity events.
2.2 Graph-Structured RAG
To better preserve relationships between entities, graph-structured RAG methods integrate knowledge graphs and graph algorithms into retrieval and generation (Kim et al., 2023; Peng et al., 2024; Xiang et al., 2025; Zhang et al., 2025b). Works such as GraphRAGG (Edge et al., 2024; Jiang et al., 2024; Mavromatis and Karypis, 2024; He et al., 2024; Zhang et al., 2025a) propagate contextual signals across retrieved text via multi-round message passing, extract entities and build binary relationship diagrams. Subsequently, KAG (Liang et al., 2025) and LightRAG (Guo et al., 2024) introduced confidence-based edge weighting mechanisms to prioritize and reinforce key relationships. GNPLLM and others (Tian et al., 2024; Shen et al., 2024; Barmettler et al., 2025; Xu et al., 2025a; Luo et al., 2025b) further fuse the learned graph embeddings with LLM representations, enriching the input features of downstream generation models. These methods have achieved significant accuracy improvements on fact-alignment tasks in dealing with paired entity relationships. However, because they are limited to modeling only binary edges, they inherently struggle to capture -ary relations that span three or more entities, resulting in the loss of key information in complex event scenarios involving multi-party interactions.
2.3 Heterogeneous and Hypergraph-Based Approaches
Moving beyond binary graphs, recent work has explored heterogeneous graph structures and hypergraphs to encode -ary relations. NodeRAG (Xu et al., 2025b) represents events as isolated nodes enriched with detailed type information, allowing the model to distinguish among entities, semantic units, and high-level summaries. HyperGraphRAG (Luo et al., 2025a) further generalizes this approach by introducing hyperedges that directly connect multiple entities within a single relational fact, thereby preserving the integrity of -ary events. HierarchicalRAG and similar systems (Chen et al., 2024; Huang et al., 2025; Jiao et al., 2025; Wang et al., 2025b; Zou et al., 2025) organize knowledge and enable hierarchical retrieval through multi-level or coarse-to-fine graph structures, while PikeRAG and others (Sun et al., 2019; Asai et al., 2019; Ma et al., 2022; Wang et al., 2025a; Wei et al., 2024; Six et al., 2025) align retrieval and generation more closely around the reasoning chain to highlight prominent multi-entity patterns in the graph. Although these methods substantially increase expressivity and multi-hop reasoning capabilities, they often fragment events across disconnected graph components, inflating the size and density of the index. This in turn raises retrieval overhead and can lead to incoherent generation when the model struggles to traverse highly entangled structures.
2.4 Semantic Unit Extraction and Global Context Modeling
Some prior efforts seek to preserve event coherence via chunk-level or sentence-level segmentation (Xu et al., 2025b; Michelmann et al., 2025; Yu et al., 2023; Brunato et al., 2023; Zhao et al., 2024; Liu et al., 2025; Ni et al., 2025) or overlapping sliding-window retrieval (Lewis et al., 2020; Izacard and Grave, 2020; Karpukhin et al., 2020; Wang et al., 2024a), but these remain fundamentally local strategies that lack corpus-wide consistency. No existing framework simultaneously (1) extracts semantically coherent units at a global level, (2) disambiguates and merges overlapping units across chunks, and (3) constructs a unified graph that balances coarse-grained -ary relations with fine-grained binary links. Our GOSU framework fills this gap: it first leverages LLMs to identify and globally filter semantic units (SUs) across all text blocks, then builds an SU-centric graph that preserves both detailed entity links and richer multi-entity structures, and finally applies a dual-phase RAG pipeline—fine-grained keyword retrieval followed by SU completion—to achieve enhanced factuality, coherence, and coverage.
3 Methodology
In this section, we present global-level optimized semantic unit-centric RAG (GOSU), as illustrated in Figure 2, which comprises three core components: global semantic unit optimization, semantic unit-centric knowledge graph construction, and semantic unit-centric retrieval and generation. Detailed descriptions of each component follow in Subsections 3.1, 3.2 and 3.3, respectively.
3.1 Global-level Semantic Unit Optimization
To ensure consistency and completeness of semantic units at the global level, GOSU introduces a “Global Semantic Unit Optimization” module at the very beginning of the pipeline, comprising three steps: initial extraction, global filtering and disambiguation, and merging with deduplication.
Initial Extraction
The external corpus consists of multiple documents. Each document is segmented into length-controlled, semantically coherent text chunks via a sliding-window algorithm and convert each chunk into its vector representation:
[TABLE]
[TABLE]
For each text chunk , we employ the selected LLM to extract a set of candidate semantic units:
[TABLE]
where represents the -th candidate unit in chunk (an event or concept that satisfies completeness, coherence, and information-bearing capacity), is the (chunk-dependent) number of candidates returned for , and is the LLM-based extraction procedure. Next, we merge the candidates from all chunks into a global pool:
[TABLE]
where is the total number of chunks and is the global candidate set.
Global Filtering
We first conduct a coarse filtering step based on cosine similarity. Given a similarity threshold, we form candidate semantic-unit pairs and then refine them with the LLM to obtain fine-grained filtering:
[TABLE]
[TABLE]
where is the global candidate set of semantic units, is an ordered pair of distinct units, is a cosine-similarity-based binary decision function with threshold , and denotes the LLM-based evaluator applied to pairs surviving the coarse stage.
Finally, we cluster and deduplicate the fine-grained pairs to obtain the refined semantic-unit set:
[TABLE]
where groups highly similar units into clusters and removes redundant elements within or across clusters.
Disambiguation and Merging with Deduplication
In addition to retrieving the corresponding text chunks via their identifiers, we augment the retrieval process with vector-similarity search, thereby providing the LLM with sufficient evidence to interpret and integrate the semantic units.
[TABLE]
[TABLE]
Here denotes the full chunk set, is a refined semantic unit, gives the chunk identifier of , is the set of chunk IDs associated with , and returns the similarity-ranked neighbors of within .
To avoid excessive retrieval, we prioritize ID-based lookup and then supplement it with similarity-based retrieval. The combined set is trimmed to a bounded size:
[TABLE]
where restricts the number of retrieved chunks, is a predefined retrieval threshold, and returns the cardinality of a set.
Finally, the retrieved text is integrated, and the LLM is used to globally refine the semantic units:
[TABLE]
where is the set of deduplicated semantic units and denotes the LLM-based global refinement procedure.
3.2 Semantic Unit-Centric Knowledge Graph Construction
After completing the global semantic-unit optimization, GOSU uses the refined set to construct the knowledge graph via three stages: entity–relation extraction, subgraph construction, and graph assembly.
Entity–Relation Extraction
Each global semantic unit is mapped to a graph node. For every , an LLM extracts fine-grained entities and binary relations , which are used to create entity nodes and relation edges while preserving context indices. Before assembly, we also extract locally identifiable entities and relations from each chunk to form a preliminary subgraph:
[TABLE]
where is the set of all text chunks and returns a set of entity–relation assertions extracted from chunk .
For each semantic unit, entities and relations are further gathered from its supporting chunks:
[TABLE]
[TABLE]
where is a semantic unit, gives the identifier of chunk , and is the set of chunk IDs associated with ; and denote the LLM-based, -conditioned entity and relation extractors applied to the context of , respectively.
Subgraph Construction
For each global semantic unit , we first resolve ambiguity and remove duplicates among its associated entities and relations . We then build an entity–relation subgraph centered at , preserving both binary and higher-order (-ary) structure:
[TABLE]
where builds an -centric subgraph by linking to entities in and instantiating relations in (binary edges; -ary, if any, via a small relation node). merges co-referent entities/relations and removes duplicates. denotes the resulting cleaned subgraph.
Knowledge Graph Assembly
Once all semantic-unit–centric subgraphs are constructed, we assemble them into the final knowledge graph by resolving cross-subgraph ambiguities and removing duplicates.
[TABLE]
where denotes the set of all subgraphs; establishes cross-subgraph links by aligning co-referent entities/relations and adding inter-unit edges based on shared identifiers and context; then collapses remaining duplicates and resolves conflicts to produce .
3.3 Semantic Unit-Centric Retrieval and Generation
Once the knowledge graph is built, GOSU adopts two complementary retrieval–generation pathways to jointly capture fine-grained binary relations and global -ary events.
Hierarchical Keyword Extraction
Building on LightRAG Guo et al. (2024), we perform hierarchical keyword extraction from the user query to support low-cost, effective retrieval of fine-grained relations. Beyond prior work that uses only low-level entity keywords and high-level thematic keywords , we introduce a mid-level “semantic-unit” tier , whose compact phrases encapsulate self-contained facts, relations, or events and thus improve retrieval precision with negligible overhead:
[TABLE]
where is the input query, is the constructed knowledge graph (used for optional conditioning and normalization), collects entity-/attribute-level terms (e.g., names, IDs, types), collects short semantic-unit phrases that summarize atomic facts or events, and collects theme-/topic-level terms. Each is a (ranked) set of keywords yielded by the LLM extractor .
Semantic-Unit Completion
We first use low- and high-level keywords to locate target entities and relations, then enrich them with weakly related but semantically relevant nodes, edges, and chunks. In parallel, we extract directly involved semantic units to cover coarse, multi-entity events that basic keyword matching may miss:
[TABLE]
[TABLE]
where / are the low-/high-level keyword sets, is the knowledge graph, the chunk set, and the semantic-unit set. returns a keyword-matched, finely scoped subgraph (with associated chunks and units ), while returns a theme-oriented subgraph (with , ), optionally expanded by lightweight graph heuristics (e.g., short-hop neighbors or similarity-ranked additions) to include weak but informative context.
When the candidates are insufficient, we augment them via similarity matching with semantic-level keywords:
[TABLE]
[TABLE]
[TABLE]
where is the semantic-level keyword set, returns similarity-matched semantic units from , limits the set size, returns the set cardinality, and is a size threshold.
Next, to further enrich both fine-grained binary relations and coarse-grained -ary events, we traverse each semantic unit’s associated entities and relations:
[TABLE]
[TABLE]
[TABLE]
where is the aggregated semantic-unit set (from Eq. (21)); collects a subgraph and chunk set by following entity/relation links and context indices associated with units in ; limits the size of the returned graph/chunk sets to a preset budget; are from Eqs. (18)–(19).
Fusion for Generation
Finally, we fuse retrieved snippets, semantic units, and graph context to guide the generator, producing an answer that cites fine-grained facts while maintaining global coherence across multi-entity events.
[TABLE]
where is the aggregated semantic-unit set (Eq. (21)), the aggregated subgraph (Eq. (24)), the aggregated chunk set (Eq. (25)); denotes the LLM-based response generator conditioned on these inputs.
4 Experiments
To comprehensively assess the effectiveness of GOSU on knowledge-intensive generation tasks, we conducted extensive experiments on several publicly available domain datasets, compared GOSU against a range of representative baselines, and performed systematic ablation studies.
4.1 Experimental Setup
Datasets.
To evaluate GOSU’s cross-vertical performance and follow established experimental protocols (Guo et al., 2024; Luo et al., 2025a), we selected four domain datasets from the UltraDomain benchmark (Qian et al., 2024): Agriculture, Computer Science (CS), Law, and Mix, together with a fifth dataset consisting of the most recent international hypertension guidelines (McCarthy et al., 2025). Additionally, following the generation methodology of Edge et al. (Edge et al., 2024), we employed an LLM to synthesize distinct RAG user profiles for each vertical and, from each profile’s perspective, generated multiple corpus-level queries that require holistic comprehension of the entire collection.
Baselines.
We compared GOSU against four state-of-the-art public RAG systems: NaiveRAG (Gao et al., 2023b), the standard baseline that retrieves fixed-length text chunks by similarity; LightRAG (Guo et al., 2024), a lightweight model that employs a two-tier retrieval strategy to balance recall and efficiency; HiRAG (Huang et al., 2025), a framework that leverages hierarchical knowledge representations to enhance semantic understanding and capture structural relations; and HyperGraphRAG (Luo et al., 2025a), a novel RAG approach that incorporates hypergraph structures to capture higher-order, multi-entity relations.
Evaluation Metrics.
To more thoroughly evaluate outcomes, particularly for queries that invoke complex, high-level semantics, we follow the evaluation protocol of KnowTuning et al. (Lyu et al., 2024; Guo et al., 2024; Edge et al., 2024) and adopt four assessment dimensions: Comprehensiveness, Diversity, Empowerment, and Overall. To ensure evaluation accuracy and mitigate potential positional bias (Zheng et al., 2023; Pezeshkpour and Hruschka, 2023), we employed an alternating pairwise comparison protocol in which candidate answers were presented in randomized left–right order and judged pairwise; for each evaluation dimension we selected the preferred answer based on these pairwise judgments. Specifically, we accept a comparison outcome only when the alternating pairwise judgments produce a consistent preference; if they do not, we treat the observed quality difference as being below the level of positional bias, deem the result inconclusive, and exclude it from further analysis. The final overall preference was determined by aggregating the rankings across the three primary dimensions (Comprehensiveness, Diversity, and Insightfulness), with ties resolved using the Overall Quality score.
Implementation Details.
We used GPT-4o-mini as the generative model and BGE-m3 for vector embeddings. To ensure experimental consistency and fair comparison, chunk size and all other retrieval- and generation-related hyperparameters were held identical across all methods.
4.2 Experimental Results
We compared GOSU against the baseline methods across each domain along multiple evaluation dimensions; the results are summarized in Table 1 and Table 2.
General Comparison.
As shown in Table 1, GOSU demonstrated stable performance across domains, consistently achieving higher win rates than all baselines on the four evaluation dimensions—Comprehensiveness, Diversity, Empowerment, and Overall—indicating its superior ability to produce more complete, varied, and practically useful responses.
Compared to NaiveRAG, GOSU achieved an average win-rate margin exceeding 50% across all evaluation dimensions, highlighting the superiority of graph-based RAG approaches over chunk-based retrieval in capturing complex semantic dependencies for knowledge-intensive tasks. Although GOSU is also a graph-augmented RAG method, it consistently outperforms LightRAG and HyperGraphRAG. This result indicates that, compared with approaches that rely solely on pairwise edges or purely -ary hyperedges, GOSU more effectively integrates fine-grained and coarse-grained semantic units and leverages them during retrieval and generation to produce higher-quality responses. Among the baselines, HiRAG exhibits the smallest performance gap relative to GOSU. This finding indicates that hierarchical knowledge representations do enhance semantic understanding and structural capture, and it further validates that GOSU’s strategy—driving the pipeline with globally completed semantic units—can achieve comparable or even superior results by explicitly integrating corpus-level semantic coherence with a dual-phase retrieval-and-generation process.
Experiments across multiple datasets highlight GOSU’s superior capability to integrate semantic information and to recognize structural variations across tasks and domains, with particularly strong gains in knowledge-intensive scenarios.
Ablation Study.
To rigorously assess the contribution of each component within the GOSU framework, we conducted extensive ablation studies at both the knowledge construction stage and the retrieval and generation stage (see Table 2).
Knowledge Construction Stage. We ablated the global-level semantic unit optimization (w/o GO). This modification produced pronounced declines in Comprehensiveness, Empowerment, and Overall scores, with the performance degradation particularly marked on the medical and legal benchmarks. These results indicate that the GO module is crucial for globally extracting and completing -ary relations and for facilitating the identification of relevant binary relations, thereby improving evidence aggregation and downstream generation quality.
Retrieval and Generation stage. We ablated each component of the three-stage retrieval mechanism—removing the entity layer (w/o EL), the relation layer (w/o RL), and the semantic-unit layer (w/o SL). All ablations produced measurable performance degradations, with the removal of the semantic-unit layer yielding the largest decline. This result validates the critical role of semantic units in completing coarse-grained information and supporting robust multi-entity evidence aggregation. Additionally, we ablated both the entity and relation layers (w/o EL & RL). The combined removal produced a marked performance degradation, further confirming that fine-grained knowledge—encoded by entity- and relation-level signals—is also indispensable for producing high-quality, factually grounded generations.
These experimental results demonstrate that each module deployed across GOSU’s pipeline is necessary to achieve optimal generation quality.
Analysis of Efficency and Cost.
We conducted a comprehensive cost comparison of GOSU and four baseline methods across the knowledge construction (offline) and retrieval and generation (online) phases. The experimental results are presented in Fig.3. We measured four cost metrics: token consumption for vector embeddings per text chunk (TPC), prompt-completion cost per text chunk (CPC), token consumption for vector embeddings per query (TPQ), and prompt-completion cost per query (CPQ).
During the knowledge construction phase, the integration of the global semantic unit optimization module increased GOSU’s embedding token consumption: total embedding token consumption (TPC) reached 29,560 tokens, substantially higher than HyperGraphRAG’s 4,940 tokens. In terms of CPC (cost per completion), GOSU incurred 0.00806. During the generation phase, GOSU recorded a TPQ of 50 tokens, slightly higher than LightRAG (30 tokens) but lower than HyperGraphRAG (70 tokens). Additionally, for CPQ (cost per completion at query time), GOSU incurred 0.00600) and lower than HiRAG (n$-ary relations, and thereby effectively supports a three-stage retrieval-and-generation pipeline. Moreover, because the usage costs of many high-performance models, including those employed in this study, have been steadily decreasing and in some cases become free, the additional token overhead is acceptable.
Regarding efficiency, because GOSU performs pairwise similarity comparisons between semantic units during knowledge-graph construction, it incurs substantial computational overhead and leads to increased preprocessing time—especially on large corpora. This pairwise matching step is computationally intensive compared with simpler indexing strategies and represents the primary source of GOSU’s higher offline latency. Nonetheless, knowledge-graph construction is typically a one-time operation in most deployment scenarios, and subsequent system activity concentrates on retrieval and query handling; therefore, the upfront cost does not materially degrade ongoing online efficiency.
5 Conclusion
GOSU is a new framework that centers the RAG pipeline on semantic units which are optimized at the corpus level; by using these globally consistent units to guide the extraction of binary and -ary relations, GOSU achieves a balanced fusion of fine-grained entity links and coarse-grained multi-entity structures, resulting in more faithful, comprehensive, and coherent retrieval-augmented generation. Unlike approaches limited to individual text chunks, GOSU introduces a two-stage coarse-to-fine filtering mechanism to better summarize semantic units and extract structural information at the global corpus level. Driven by these semantic units, the SU-centric design and three-stage retrieval pipeline supplement low-level signals with tightly related high-level perspectives, yielding more coherent and semantically complete retrieval and generation. Extensive experiments demonstrate that GOSU consistently outperforms existing RAG pipelines across diverse vertical domains and related tasks. Although GOSU incurs additional computational and monetary costs to achieve improved performance, these trade-offs remain within acceptable bounds. While GOSU sacrifices some efficiency and incurs higher preprocessing and token costs to deliver superior retrieval and generation quality, our measurements show that these overheads are moderate and justified bythe overall substantial gains. Taken together, GOSU emphasizes the equal and complementary roles of fine- and coarse-grained relational modeling within graph-based RAG frameworks, delivering a scalable and high-quality approach for real-world AIGC tasks that require faithful, comprehensive, and coherent knowledge integration.
Limitations
Cross-domain experiments demonstrate that GOSU achieves substantial improvements in retrieval-augmented generation, but there remains room for further refinement.
- •
First, the current method does not incorporate multimodal inputs and therefore cannot fully exploit knowledge embedded in images, tables, and other non-textual artifacts within multimodal corpora, which may lead to omission of important information.
- •
Second, GOSU primarily focuses on -ary and binary relations and may lack the capacity to uncover deep chains of reasoning required for more complex inferential tasks.
- •
Additionally, although GOSU enriches knowledge structure by centering on semantic units and employing a three-layer retrieval pipeline, there is still potential to further improve retrieval and generation efficiency.
Future work will investigate methods to overcome the limitations identified above.
Ethics Statement
This paper investigates RAG via GOSU, a semantic-unit–centric framework that globally optimizes semantic units to drive extraction of binary and -ary relations. We employ large language models for semantic-unit extraction and SU-centric graph construction, together with retrieval-augmented generation techniques to improve knowledge representation and generation quality. All data used in this study are publicly available and contain no personally identifiable or sensitive information; therefore, we believe the work adheres to ethical principles.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Asai et al. (2019) Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, and Caiming Xiong. 2019. Learning to retrieve reasoning paths over Wikipedia graph for question answering . Preprint , ar Xiv:1911.10470. Preprint, ar Xiv:1911.10470.
- 2Asai et al. (2023) Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. Self-RAG: Learning to retrieve, generate, and critique through Self-Reflection . Preprint , ar Xiv:2310.11511. Preprint, ar Xiv:2310.11511.
- 3Bai et al. (2024) Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou. 2024. Hallucination of Multimodal Large Language Models: A survey . Preprint , ar Xiv:2404.18930. Preprint, ar Xiv:2404.18930.
- 4Bang et al. (2025) Yejin Bang, Ziwei Ji, Alan Schelten, Anthony Hartshorn, Tara Fowler, Cheng Zhang, Nicola Cancedda, and Pascale Fung. 2025. Hallu Lens: LLM hallucination benchmark . Preprint , ar Xiv:2504.17550. Preprint, ar Xiv:2504.17550.
- 5Barmettler et al. (2025) Joel Barmettler, Abraham Bernstein, and Luca Rossetto. 2025. Concept Former: Towards efficient use of Knowledge-Graph embeddings in Large Language Models . Preprint , ar Xiv:2504.07624. Preprint, ar Xiv:2504.07624.
- 6Brunato et al. (2023) Dominique Brunato, Felice Dell’Orletta, Irene Dini, and Andrea Amelio Ravelli. 2023. Coherent or not? stressing a neural language model for discourse coherence in multiple languages . In Findings of the Association for Computational Linguistics: ACL 2023 , pages 10690–10700, Toronto, Canada. Association for Computational Linguistics.
- 7Chen et al. (2024) Weijie Chen, Ting Bai, Jinbo Su, Jian Luan, Wei Liu, and Chuan Shi. 2024. KG-Retriever: Efficient knowledge indexing for Retrieval-Augmented Large Language Models . Preprint , ar Xiv:2412.05547. Preprint, ar Xiv:2412.05547.
- 8Edge et al. (2024) Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. From local to global: A graph RAG approach to Query-Focused Summarization . Preprint , ar Xiv:2404.16130. Preprint, ar Xiv:2404.16130.
