Decoupling Scores and Text: The Politeness Principle in Peer Review
Yingxuan Wen

TL;DR
This study compares the effectiveness of numerical scores versus textual reviews in peer review, revealing that scores are more reliable due to the Politeness Principle masking true rejection signals in text.
Contribution
It introduces a large dataset and analysis showing the limitations of text reviews and the dominance of scores in predicting acceptance outcomes.
Findings
Score-based models achieve 91% accuracy, outperforming text-based models at 81%.
High kurtosis and negative skewness in scores highlight decisive low scores in rejections.
Politeness in reviews masks rejection signals, complicating outcome interpretation from text.
Abstract
Authors often struggle to interpret peer review feedback, deriving false hope from polite comments or feeling confused by specific low scores. To investigate this, we construct a dataset of over 30,000 ICLR 2021-2025 submissions and compare acceptance prediction performance using numerical scores versus text reviews. Our experiments reveal a significant performance gap: score-based models achieve 91% accuracy, while text-based models reach only 81% even with large language models, indicating that textual information is considerably less reliable. To explain this phenomenon, we first analyze the 9% of samples that score-based models fail to predict, finding their score distributions exhibit high kurtosis and negative skewness, which suggests that individual low scores play a decisive role in rejection even when the average score falls near the borderline. We then examine why text-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
