Disco-RAG: Discourse-Aware Retrieval-Augmented Generation
Dongqi Liu, Hang Ding, Qiming Feng, Xurong Xie, Zhucun Xue, Chengjie Wang, Jian Li, Jiangning Zhang, Yabiao Wang

TL;DR
Disco-RAG introduces a discourse-aware framework for retrieval-augmented generation that leverages discourse structures to improve performance on knowledge-intensive tasks.
Contribution
It proposes constructing discourse trees and rhetorical graphs to explicitly incorporate discourse cues into RAG models, enhancing their ability to synthesize dispersed evidence.
Findings
Achieves state-of-the-art results on question answering benchmarks.
Improves long-document summarization performance without fine-tuning.
Demonstrates the importance of discourse structure in RAG systems.
Abstract
Retrieval-Augmented Generation (RAG) has emerged as an important means of enhancing the performance of large language models (LLMs) in knowledge-intensive tasks. However, most existing RAG strategies treat retrieved passages in a flat and unstructured way, which prevents the model from capturing structural cues and constrains its ability to synthesize knowledge from dispersed evidence across documents. To overcome these limitations, we propose Disco-RAG, a discourse-aware framework that explicitly injects discourse signals into the generation process. Our method constructs intra-chunk discourse trees to capture local hierarchies and builds inter-chunk rhetorical graphs to model cross-passage coherence. These structures are jointly integrated into a planning blueprint that conditions the generation. Experiments on question answering and long-document summarization benchmarks show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
