DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning

Zhuoyang Zou; Abolfazl Ansari; Delvin Ce Zhang; Dongwon Lee; Wenpeng Yin

arXiv:2601.07611·cs.AI·February 19, 2026

DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning

Zhuoyang Zou, Abolfazl Ansari, Delvin Ce Zhang, Dongwon Lee, Wenpeng Yin

PDF

Open Access

TL;DR

DIAGPaper is a multi-agent framework that improves the identification, validation, and prioritization of paper weaknesses by simulating review criteria, engaging in structured debate, and ranking issues based on severity, leading to more accurate and user-focused review insights.

Contribution

This work introduces DIAGPaper, a novel multi-agent system that models review criteria, incorporates author-reviewer debates, and prioritizes weaknesses, addressing key limitations of prior methods.

Findings

01

Outperforms existing methods in validity and specificity of weaknesses

02

Produces more paper-specific and prioritized weakness lists

03

Demonstrates effectiveness on AAAR and ReviewCritique benchmarks

Abstract

Paper weakness identification using single-agent or multi-agent LLMs has attracted increasing attention, yet existing approaches exhibit key limitations. Many multi-agent systems simulate human roles at a surface level, missing the underlying criteria that lead experts to assess complementary intellectual aspects of a paper. Moreover, prior methods implicitly assume identified weaknesses are valid, ignoring reviewer bias, misunderstanding, and the critical role of author rebuttals in validating review quality. Finally, most systems output unranked weakness lists, rather than prioritizing the most consequential issues for users. In this work, we propose DIAGPaper, a novel multi-agent framework that addresses these challenges through three tightly integrated modules. The customizer module simulates human-defined review criteria and instantiates multiple reviewer agents with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling · Academic integrity and plagiarism