Reward Modeling for Scientific Writing Evaluation

Furkan \c{S}ahinu\c{c}; Subhabrata Dutta; Iryna Gurevych

arXiv:2601.11374·cs.CL·April 20, 2026

Reward Modeling for Scientific Writing Evaluation

Furkan \c{S}ahinu\c{c}, Subhabrata Dutta, Iryna Gurevych

PDF

2 Models

TL;DR

This paper introduces cost-effective, open-source reward models designed for evaluating scientific writing, capable of generalizing across diverse tasks and criteria without task-specific retraining.

Contribution

It presents a novel two-stage training framework and multi-aspect evaluation design to improve scientific writing assessment by LLM-based reward models.

Findings

01

Models outperform existing benchmarks in scientific writing evaluation.

02

The approach generalizes well to unseen evaluation settings.

03

Training improves reasoning and multi-aspect assessment capabilities.

Abstract

Scientific writing is an expert-domain task that demands deep domain knowledge, task-specific requirements and reasoning capabilities that leverage the domain knowledge to satisfy the task specifications. While scientific text generation has been widely studied, its evaluation remains a challenging and open problem. It is critical to develop models that can be reliably deployed for evaluating diverse open-ended scientific writing tasks while adhering to their distinct requirements. However, existing LLM-based judges and reward models are primarily optimized for general-purpose benchmarks with fixed scoring rubrics and evaluation criteria. Consequently, they often fail to reason over sparse knowledge of scientific domains when interpreting task-dependent and multi-faceted criteria. Moreover, fine-tuning for each individual task is costly and impractical for low-resource settings. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.