Reinforcement Learning with Conditional Expectation Reward

Changyi Xiao; Caijun Xu; Yixin Cao

arXiv:2603.10624·cs.LG·March 12, 2026

Reinforcement Learning with Conditional Expectation Reward

Changyi Xiao, Caijun Xu, Yixin Cao

PDF

Open Access

TL;DR

This paper introduces Conditional Expectation Reward (CER), a novel verification method for reinforcement learning in language models that uses the model itself as an implicit verifier, enabling application to diverse reasoning tasks without external rules.

Contribution

The paper proposes CER, a new soft reward mechanism that replaces domain-specific verifiers with the language model's own likelihood, broadening reinforcement learning applicability.

Findings

01

CER improves reasoning performance across mathematical and general tasks.

02

CER provides a graded reward signal reflecting answer correctness.

03

Experimental results validate CER's effectiveness and flexibility.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective in enhancing the reasoning capabilities of large language models, particularly in domains such as mathematics where reliable rule-based verifiers can be constructed. However, the reliance on handcrafted, domain-specific verification rules substantially limits the applicability of RLVR to general reasoning domains with free-form answers, where valid answers often exhibit significant variability, making it difficult to establish complete and accurate rules. To address this limitation, we propose Conditional Expectation Reward (CER), which leverages the large language model itself as an implicit verifier, and is therefore applicable to general domains and eliminates the need for external verifiers or auxiliary models. CER is defined as the expected likelihood of generating the reference answer conditioned on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications