ReportLogic: Evaluating Logical Quality in Deep Research Reports
Jujia Zhao, Zhaoxin Huan, Zihan Wang, Xiaolu Zhang, Jun Zhou, Suzan Verberne, and Zhaochun Ren

TL;DR
ReportLogic introduces a comprehensive benchmark and evaluation framework for assessing the logical quality of deep research reports generated by LLMs, emphasizing traceability, understanding, and verification of claims.
Contribution
It presents a hierarchical taxonomy for logical evaluation, a human-annotated dataset, and an open-source LogicJudge model for scalable, robust assessment of report quality.
Findings
LLM judges are often misled by superficial cues like verbosity
Adversarial attacks reveal vulnerabilities in current evaluation methods
LogicJudge can guide improvements in report logical reliability
Abstract
Users increasingly rely on Large Language Models (LLMs) for Deep Research, using them to synthesize diverse sources into structured reports that support understanding and action. In this context, the practical reliability of such reports hinges on logical quality: whether the report's claims and arguments are explicitly supported and can be trusted as a basis for downstream use, rather than merely appearing fluent or informative. However, current evaluation frameworks largely overlook this requirement. To bridge this gap, we introduce ReportLogic, a benchmark that quantifies report-level logical quality through a reader-centric lens of auditability. Specifically, ReportLogic adopts a hierarchical taxonomy that evaluates whether readers can (1) trace an on-topic report structure with a unified analytical arc (Macro-Logic), (2) understand the progression with necessary context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education
