Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers
Zhijian Xu, Yilun Zhao, Manasi Patwardhan, Lovekesh Vig, Arman Cohan

TL;DR
This paper introduces LimitGen, a benchmark for evaluating LLMs' ability to identify scientific research limitations, and demonstrates that augmenting LLMs with literature retrieval improves their performance in providing constructive peer review feedback.
Contribution
The paper presents the first comprehensive benchmark for LLMs to identify research limitations and shows that literature retrieval augmentation enhances their effectiveness.
Findings
LimitGen benchmark includes synthetic and real limitations datasets.
Augmentation with literature retrieval improves LLMs' limitation identification.
LLMs can support early-stage peer review with enhanced feedback capabilities.
Abstract
Peer review is fundamental to scientific research, but the growing volume of publications has intensified the challenges of this expertise-intensive process. While LLMs show promise in various scientific tasks, their potential to assist with peer review, particularly in identifying paper limitations, remains understudied. We first present a comprehensive taxonomy of limitation types in scientific research, with a focus on AI. Guided by this taxonomy, for studying limitations, we present LimitGen, the first comprehensive benchmark for evaluating LLMs' capability to support early-stage feedback and complement human peer review. Our benchmark consists of two subsets: LimitGen-Syn, a synthetic dataset carefully created through controlled perturbations of high-quality papers, and LimitGen-Human, a collection of real human-written limitations. To improve the ability of LLM systems to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
