GPT vs Human for Scientific Reviews: A Dual Source Review on Applications of ChatGPT in Science
Chenxi Wu, Alan John Varghese, Vivek Oommen, George Em Karniadakis

TL;DR
This study compares GPT-based models and human reviewers in scientific review tasks, revealing that GPT-4 aligns closely with humans in accuracy and structure, but still faces limitations in understanding complex methodologies and ethical considerations.
Contribution
It provides a comprehensive evaluation of GPT models' performance in scientific reviews, highlighting their strengths and current limitations compared to human reviewers.
Findings
50% of SciSpace responses align with human reviews on objective questions
GPT-4 often rates human reviews higher in accuracy
SciSpace scores higher in structure, clarity, and completeness
Abstract
The new polymath Large Language Models (LLMs) can speed-up greatly scientific reviews, possibly using more unbiased quantitative metrics, facilitating cross-disciplinary connections, and identifying emerging trends and research gaps by analyzing large volumes of data. However, at the present time, they lack the required deep understanding of complex methodologies, they have difficulty in evaluating innovative claims, and they are unable to assess ethical issues and conflicts of interest. Herein, we consider 13 GPT-related papers across different scientific domains, reviewed by a human reviewer and SciSpace, a large language model, with the reviews evaluated by three distinct types of evaluators, namely GPT-3.5, a crowd panel, and GPT-4. We found that 50% of SciSpace's responses to objective questions align with those of a human reviewer, with GPT-4 (informed evaluator) often rating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Cosine Annealing · Multi-Head Attention · Adam · Absolute Position Encodings · Layer Normalization
