Reward Models Identify Consistency, Not Causality

Yuhui Xu; Hanze Dong; Lei Wang; Caiming Xiong; Junnan Li

arXiv:2502.14619·cs.LG·February 21, 2025

Reward Models Identify Consistency, Not Causality

Yuhui Xu, Hanze Dong, Lei Wang, Caiming Xiong, Junnan Li

PDF

Open Access

TL;DR

This paper reveals that reward models for language models focus more on structural consistency and reasoning patterns than on actual causal correctness, highlighting a key limitation in current alignment techniques.

Contribution

The study demonstrates that reward models prioritize consistency over causality, challenging assumptions and suggesting the need for causality-aware reward modeling approaches.

Findings

01

Reward models rely heavily on structural consistency.

02

Removing problem statements minimally affects reward scores.

03

Disrupting reasoning flow significantly impacts reward outputs.

Abstract

Reward models (RMs) play a crucial role in aligning large language models (LLMs) with human preferences and enhancing reasoning quality. Traditionally, RMs are trained to rank candidate outputs based on their correctness and coherence. However, in this work, we present several surprising findings that challenge common assumptions about RM behavior. Our analysis reveals that state-of-the-art reward models prioritize structural consistency over causal correctness. Specifically, removing the problem statement has minimal impact on reward scores, whereas altering numerical values or disrupting the reasoning flow significantly affects RM outputs. Furthermore, RMs exhibit a strong dependence on complete reasoning trajectories truncated or incomplete steps lead to significant variations in reward assignments, indicating that RMs primarily rely on learned reasoning patterns rather than explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic Policies and Impacts