TL;DR
This paper introduces a comparison-native framework for evaluating scientific papers using LLMs, shifting from absolute scoring to pairwise ranking to improve robustness and generalization across datasets.
Contribution
The authors propose CNPE, a novel framework that integrates comparison into data construction and model training for more reliable paper evaluation.
Findings
Achieves 21.8% relative improvement over baseline DeepReview-14B.
Demonstrates robust generalization to five unseen datasets.
Utilizes a graph-based similarity ranking for informative pair sampling.
Abstract
Large language models (LLMs) are currently applied to scientific paper evaluation by assigning an absolute score to each paper independently. However, since score scales vary across conferences, time periods, and evaluation criteria, models trained on absolute scores are prone to fitting narrow, context-specific rules rather than developing robust scholarly judgment. To overcome this limitation, we propose shifting paper evaluation from isolated scoring to collaborative ranking. In particular, we design a omparison-ative framework for aper valuation (), integrating comparison into both data construction and model learning. We first propose a graph-based similarity ranking algorithm to facilitate the sampling of more informative and discriminative paper pairs from a collection. We then enhance relative quality judgment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · scientometrics and bibliometrics research · Machine Learning in Materials Science
