DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning
Zhuoyang Zou, Abolfazl Ansari, Delvin Ce Zhang, Dongwon Lee, Wenpeng Yin

TL;DR
DIAGPaper is a multi-agent framework that improves the identification, validation, and prioritization of paper weaknesses by simulating review criteria, engaging in structured debate, and ranking issues based on severity, leading to more accurate and user-focused review insights.
Contribution
This work introduces DIAGPaper, a novel multi-agent system that models review criteria, incorporates author-reviewer debates, and prioritizes weaknesses, addressing key limitations of prior methods.
Findings
Outperforms existing methods in validity and specificity of weaknesses
Produces more paper-specific and prioritized weakness lists
Demonstrates effectiveness on AAAR and ReviewCritique benchmarks
Abstract
Paper weakness identification using single-agent or multi-agent LLMs has attracted increasing attention, yet existing approaches exhibit key limitations. Many multi-agent systems simulate human roles at a surface level, missing the underlying criteria that lead experts to assess complementary intellectual aspects of a paper. Moreover, prior methods implicitly assume identified weaknesses are valid, ignoring reviewer bias, misunderstanding, and the critical role of author rebuttals in validating review quality. Finally, most systems output unranked weakness lists, rather than prioritizing the most consequential issues for users. In this work, we propose DIAGPaper, a novel multi-agent framework that addresses these challenges through three tightly integrated modules. The customizer module simulates human-defined review criteria and instantiates multiple reviewer agents with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Academic integrity and plagiarism
