ReportLogic: Evaluating Logical Quality in Deep Research Reports

Jujia Zhao; Zhaoxin Huan; Zihan Wang; Xiaolu Zhang; Jun Zhou; Suzan Verberne; and Zhaochun Ren

arXiv:2602.18446·cs.CL·February 24, 2026

ReportLogic: Evaluating Logical Quality in Deep Research Reports

Jujia Zhao, Zhaoxin Huan, Zihan Wang, Xiaolu Zhang, Jun Zhou, Suzan Verberne, and Zhaochun Ren

PDF

Open Access

TL;DR

ReportLogic introduces a comprehensive benchmark and evaluation framework for assessing the logical quality of deep research reports generated by LLMs, emphasizing traceability, understanding, and verification of claims.

Contribution

It presents a hierarchical taxonomy for logical evaluation, a human-annotated dataset, and an open-source LogicJudge model for scalable, robust assessment of report quality.

Findings

01

LLM judges are often misled by superficial cues like verbosity

02

Adversarial attacks reveal vulnerabilities in current evaluation methods

03

LogicJudge can guide improvements in report logical reliability

Abstract

Users increasingly rely on Large Language Models (LLMs) for Deep Research, using them to synthesize diverse sources into structured reports that support understanding and action. In this context, the practical reliability of such reports hinges on logical quality: whether the report's claims and arguments are explicitly supported and can be trusted as a basis for downstream use, rather than merely appearing fluent or informative. However, current evaluation frameworks largely overlook this requirement. To bridge this gap, we introduce ReportLogic, a benchmark that quantifies report-level logical quality through a reader-centric lens of auditability. Specifically, ReportLogic adopts a hierarchical taxonomy that evaluates whether readers can (1) trace an on-topic report structure with a unified analytical arc (Macro-Logic), (2) understand the progression with necessary context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education