Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study
Chuanlei Li, Xu Hu, Minghui Xu, Kun Li, Yue Zhang, Xiuzhen Cheng

TL;DR
This study investigates the potential of using Large Language Models to automate academic paper reviews, demonstrating significant reductions in review time and cost but highlighting limitations in judgment accuracy.
Contribution
The paper proposes an automated review system integrating advanced LLM techniques and evaluates its effectiveness and limitations on real conference submissions.
Findings
LLMs reduce review time to 2.48 hours on average
Review cost decreases to approximately $104.28 USD
Low similarity (38.6%) between LLM-selected and accepted papers
Abstract
Academic paper review typically requires substantial time, expertise, and human resources. Large Language Models (LLMs) present a promising method for automating the review process due to their extensive training data, broad knowledge base, and relatively low usage cost. This work explores the feasibility of using LLMs for academic paper review by proposing an automated review system. The system integrates Retrieval Augmented Generation (RAG), the AutoGen multi-agent system, and Chain-of-Thought prompting to support tasks such as format checking, standardized evaluation, comment generation, and scoring. Experiments conducted on 290 submissions from the WASA 2024 conference using GPT-4o show that LLM-based review significantly reduces review time (average 2.48 hours) and cost (average $104.28 USD). However, the similarity between LLM-selected papers and actual accepted papers remains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
