RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Haozhe Wang; Cong Wei; Weiming Ren; Jiaming Liu; Fangzhen Lin; Wenhu Chen

arXiv:2604.11626·cs.AI·April 15, 2026

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Haozhe Wang, Cong Wei, Weiming Ren, Jiaming Liu, Fangzhen Lin, Wenhu Chen

PDF

1 Repo 2 Models 3 Datasets

TL;DR

RationalRewards introduces a multi-dimensional critique-based reward model for visual generation, enhancing training interpretability and test-time output refinement through structured reasoning.

Contribution

It presents Preference-Anchored Rationalization (PARROT) to train high-quality rationales from preference data, enabling improved reward modeling with less data and better generator performance.

Findings

01

RationalRewards achieves state-of-the-art preference prediction among open-source models.

02

The critique-refine loop improves generator outputs without parameter updates.

03

Structured reasoning unlocks latent capabilities in visual generators.

Abstract

Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques before scoring transforms them from passive evaluators into active optimization tools, improving generators in two complementary ways: at training time, structured rationales provide interpretable, fine-grained rewards for reinforcement learning; at test time, a Generate-Critique-Refine loop turns critiques into targeted prompt revisions that improve outputs without any parameter updates. To train such a reward model without costly rationale annotations, we introduce Preference-Anchored Rationalization (PARROT), a principled framework that recovers high-quality rationales from readily available preference data through anchored generation, consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiger-ai-lab/RationalRewards
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.