BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents
Shu Wang, Yingli Zhou, Yixiang Fang

TL;DR
BookRAG introduces a hierarchical, structure-aware indexing method for retrieval-augmented generation on complex documents, significantly improving question answering performance by exploiting document hierarchies and entity relations.
Contribution
It presents BookRAG, a novel hierarchical index structure and query method tailored for complex, structured documents, enhancing retrieval and QA accuracy.
Findings
Achieves state-of-the-art retrieval recall and QA accuracy.
Outperforms baseline methods on three benchmark datasets.
Maintains competitive efficiency in retrieval processes.
Abstract
As an effective method to boost the performance of Large Language Models (LLMs) on the question answering (QA) task, Retrieval-Augmented Generation (RAG), which queries highly relevant information from external complex documents, has attracted tremendous attention from both industry and academia. Existing RAG approaches often focus on general documents, and they overlook the fact that many real-world documents (such as books, booklets, handbooks, etc.) have a hierarchical structure, which organizes their content from different granularity levels, leading to poor performance for the QA task. To address these limitations, we introduce BookRAG, a novel RAG approach targeted for documents with a hierarchical structure, which exploits logical hierarchies and traces entity relations to query the highly relevant information. Specifically, we build a novel index structure, called BookIndex, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Text Analysis Techniques
