DocSage: An Information Structuring Agent for Multi-Doc Multi-Entity Question Answering
Teng Lin, Yizhang Zhu, Zhengxuan Zhang, Yuyu Luo, Nan Tang

TL;DR
DocSage introduces an innovative framework that enhances multi-document, multi-entity question answering by integrating schema discovery, structured information extraction, and relational reasoning, significantly improving accuracy over existing methods.
Contribution
The paper presents a novel agentic framework combining dynamic schema discovery, structured extraction, and schema-aware reasoning for improved multi-entity question answering.
Findings
Achieves over 27% accuracy improvement on benchmarks.
Effectively captures complex entity relationships across documents.
Reduces extraction errors with error-aware correction mechanisms.
Abstract
Multi-document Multi-entity Question Answering inherently demands models to track implicit logic between multiple entities across scattered documents. However, existing Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks suffer from critical limitations: standard RAG's vector similarity-based coarse-grained retrieval often omits critical facts, graph-based RAG fails to efficiently integrate fragmented complex relationship networks, and both lack schema awareness, leading to inadequate cross-document evidence chain construction and inaccurate entity relationship deduction. To address these challenges, we propose DocSage, an end-to-end agentic framework that integrates dynamic schema discovery, structured information extraction, and schema-aware relational reasoning with error guarantees. DocSage operates through three core modules: (1) A schema discovery…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Data Quality and Management
