Triviality Corrected Endogenous Reward

Xinda Wang; Zhengxu Hou; Yangshijie Zhang; Bingren Yan; Jialin Liu; Chenzhuo Zhao; Zhibo Yang; Bin-Bin Yang; Feng Xiao

arXiv:2604.11522·cs.CL·April 14, 2026

Triviality Corrected Endogenous Reward

Xinda Wang, Zhengxu Hou, Yangshijie Zhang, Bingren Yan, Jialin Liu, Chenzhuo Zhao, Zhibo Yang, Bin-Bin Yang, Feng Xiao

PDF

TL;DR

This paper introduces TCER, a novel reward mechanism for reinforcement learning in open-ended text generation that mitigates triviality bias and improves output diversity and quality across tasks.

Contribution

We propose TCER, a new endogenous reward method that rewards information gain relative to a reference policy, addressing triviality bias in open-ended text generation.

Findings

01

TCER improves diversity and content quality in open-ended text generation.

02

TCER outperforms baseline methods across multiple benchmarks.

03

TCER effectively transfers to mathematical reasoning tasks.

Abstract

Reinforcement learning for open-ended text generation is constrained by the lack of verifiable rewards, necessitating reliance on judge models that require either annotated data or powerful closed-source models. Inspired by recent work on unsupervised reinforcement learning for mathematical reasoning using confidence-based endogenous rewards, we investigate whether this principle can be adapted to open-ended writing tasks. We find that directly applying confidence rewards leads to Triviality Bias: the policy collapses toward high-probability outputs, reducing diversity and meaningful content. We propose TCER (Triviality Corrected Endogenous Reward), which addresses this bias by rewarding the relative information gain between a specialist policy and a generalist reference policy, modulated by a probability-dependent correction mechanism. Across multiple writing benchmarks and model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.