ReviewerGPT? An Exploratory Study on Using Large Language Models for   Paper Reviewing

Ryan Liu; Nihar B. Shah

arXiv:2306.00622·cs.CL·June 2, 2023·26 cites

ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing

Ryan Liu, Nihar B. Shah

PDF

Open Access

TL;DR

This study explores the potential of large language models, especially GPT-4, to assist in scientific paper review tasks, finding they can identify errors and verify checklists but struggle with overall quality assessment.

Contribution

The paper provides an empirical evaluation of GPT-4's capabilities in error detection, checklist verification, and paper comparison, highlighting its strengths and limitations as a review assistant.

Findings

01

GPT-4 detects errors in 7 of 13 test papers

02

LLM achieves 86.6% accuracy on checklist verification

03

LLM struggles to reliably compare paper quality in abstract pairs

Abstract

Given the rapid ascent of large language models (LLMs), we study the question: (How) can large language models help in reviewing of scientific papers or proposals? We first conduct some pilot studies where we find that (i) GPT-4 outperforms other LLMs (Bard, Vicuna, Koala, Alpaca, LLaMa, Dolly, OpenAssistant, StableLM), and (ii) prompting with a specific question (e.g., to identify errors) outperforms prompting to simply write a review. With these insights, we study the use of LLMs (specifically, GPT-4) for three tasks: 1. Identifying errors: We construct 13 short computer science papers each with a deliberately inserted error, and ask the LLM to check for the correctness of these papers. We observe that the LLM finds errors in 7 of them, spanning both mathematical and conceptual errors. 2. Verifying checklists: We task the LLM to verify 16 closed-ended checklist questions in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Label Smoothing · Layer Normalization · Byte Pair Encoding · Softmax · Adam · Absolute Position Encodings