ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing
Ryan Liu, Nihar B. Shah

TL;DR
This study explores the potential of large language models, especially GPT-4, to assist in scientific paper review tasks, finding they can identify errors and verify checklists but struggle with overall quality assessment.
Contribution
The paper provides an empirical evaluation of GPT-4's capabilities in error detection, checklist verification, and paper comparison, highlighting its strengths and limitations as a review assistant.
Findings
GPT-4 detects errors in 7 of 13 test papers
LLM achieves 86.6% accuracy on checklist verification
LLM struggles to reliably compare paper quality in abstract pairs
Abstract
Given the rapid ascent of large language models (LLMs), we study the question: (How) can large language models help in reviewing of scientific papers or proposals? We first conduct some pilot studies where we find that (i) GPT-4 outperforms other LLMs (Bard, Vicuna, Koala, Alpaca, LLaMa, Dolly, OpenAssistant, StableLM), and (ii) prompting with a specific question (e.g., to identify errors) outperforms prompting to simply write a review. With these insights, we study the use of LLMs (specifically, GPT-4) for three tasks: 1. Identifying errors: We construct 13 short computer science papers each with a deliberately inserted error, and ask the LLM to check for the correctness of these papers. We observe that the LLM finds errors in 7 of them, spanning both mathematical and conceptual errors. 2. Verifying checklists: We task the LLM to verify 16 closed-ended checklist questions in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Label Smoothing · Layer Normalization · Byte Pair Encoding · Softmax · Adam · Absolute Position Encodings
