Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering
Huiyao Chen, Yi Yang, Yinghui Li, Meishan Zhang, Baotian Hu, Min Zhang

TL;DR
This paper introduces a discourse-aware hierarchical framework for long document question answering that leverages rhetorical structure theory to improve retrieval and comprehension across multiple languages and genres.
Contribution
It presents a novel approach combining discourse parsing, LLM-based node enhancement, and structure-guided retrieval to better utilize discourse structures in long document QA.
Findings
Consistent performance improvements over existing methods across four datasets.
Effective discourse parsing for lengthy documents using language-universal methods.
Robustness demonstrated across diverse document types and languages.
Abstract
Existing long-document question answering systems typically process texts as flat sequences or use heuristic chunking, which overlook the discourse structures that naturally guide human comprehension. We present a discourse-aware hierarchical framework that leverages rhetorical structure theory (RST) for long document question answering. Our approach converts discourse trees into sentence-level representations and employs LLM-enhanced node representations to bridge structural and semantic information. The framework involves three key innovations: language-universal discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval. Extensive experiments on four datasets demonstrate consistent improvements over existing approaches through the incorporation of discourse structure, across multiple genres and languages.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
