ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Yu Liang; Liangxin Liu; Longzheng Wang; Yan Wang; Yueyang Zhang; Long Xia; Zhiyuan Sun; Daiting Shi

arXiv:2604.07484·cs.AI·April 21, 2026

ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Yu Liang, Liangxin Liu, Longzheng Wang, Yan Wang, Yueyang Zhang, Long Xia, Zhiyuan Sun, Daiting Shi

PDF

1 Repo

TL;DR

ConsistRM is a self-training framework that improves generative reward models by using consistency-aware rewards, reducing reliance on human annotations, and enhancing stability and output consistency.

Contribution

It introduces a novel self-training method with consistency-aware rewards that stabilize training and improve alignment without human-labeled data.

Findings

01

Outperforms vanilla RFT by 1.5% on benchmark datasets.

02

Enhances output consistency and reduces position bias.

03

Provides stable pseudo-labels through temporal consistency.

Abstract

Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering greater representational capacity and flexibility than traditional scalar reward models. However, GRMs face two major challenges: reliance on costly human-annotated data restricts scalability, and self-training approaches often suffer from instability and vulnerability to reward hacking. To address these issues, we propose ConsistRM, a self-training framework that enables effective and stable GRM training without human annotations. ConsistRM incorporates the Consistency-Aware Answer Reward, which produces reliable pseudo-labels with temporal consistency, thereby providing more stable model optimization. Moreover, the Consistency-Aware Critique Reward is introduced to assess semantic consistency across multiple critiques and allocates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuliangCarmelo/ConsistRM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.