COMPARE: A Taxonomy and Dataset of Comparison Discussions in Peer Reviews
Shruti Singh, Mayank Singh, Pawan Goyal

TL;DR
COMPARE introduces a taxonomy and dataset for comparison discussions in peer reviews of experimental deep learning papers, enabling analysis of review content and development of models to identify comparison sentences.
Contribution
It provides the first comprehensive taxonomy and annotated dataset of comparison discussions in peer reviews, along with pretrained models for identifying comparison sentences.
Findings
Annotated 1,800 sentences across 117 reviews.
Achieved a maximum F1 score of 0.49 in identifying comparison sentences.
Pretrained models on ML, NLP, and CV paper reviews.
Abstract
Comparing research papers is a conventional method to demonstrate progress in experimental research. We present COMPARE, a taxonomy and a dataset of comparison discussions in peer reviews of research papers in the domain of experimental deep learning. From a thorough observation of a large set of review sentences, we build a taxonomy of categories in comparison discussions and present a detailed annotation scheme to analyze this. Overall, we annotate 117 reviews covering 1,800 sentences. We experiment with various methods to identify comparison sentences in peer reviews and report a maximum F1 Score of 0.49. We also pretrain two language models specifically on ML, NLP, and CV paper abstracts and reviews to learn informative representations of peer reviews. The annotated dataset and the pretrained models are available at https://github.com/shruti-singh/COMPARE .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Software Engineering Research
