A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP   Applications

Dongyeop Kang; Waleed Ammar; Bhavana Dalvi; Madeleine van; Zuylen; Sebastian Kohlmeier; Eduard Hovy; Roy Schwartz

arXiv:1804.09635·cs.CL·April 26, 2018·33 cites

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van, Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces PeerRead, a comprehensive dataset of peer reviews and paper decisions, enabling NLP research on peer review processes and prediction tasks, with initial baseline results demonstrating its potential.

Contribution

It provides the first large, publicly available dataset of peer reviews and paper decisions, along with NLP tasks and baseline models for acceptance prediction and review scoring.

Findings

01

Simple models can predict paper acceptance with 21% error reduction.

02

Models outperform mean baseline on high-variance review aspects.

03

The dataset reveals interesting phenomena in peer review comments.

Abstract

Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research purposes (PeerRead v1) providing an opportunity to study this important artifact. The dataset consists of 14.7K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR. The dataset also includes 10.7K textual peer reviews written by experts for a subset of the papers. We describe the data collection process and report interesting observed phenomena in the peer reviews. We also propose two novel NLP tasks based on this dataset and provide simple baseline models. In the first task, we show that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. In the second task, we predict the numerical scores of review…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/PeerRead
noneOfficial

Datasets

allenai/peer_read
dataset· 164 dl
164 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Expert finding and Q&A systems