Diffusion Reinforcement Learning via Centered Reward Distillation

Yuanzhi Zhu; Xi Wang; St\'ephane Lathuili\`ere; Vicky Kalogeiton

arXiv:2603.14128·cs.CV·March 17, 2026

Diffusion Reinforcement Learning via Centered Reward Distillation

Yuanzhi Zhu, Xi Wang, St\'ephane Lathuili\`ere, Vicky Kalogeiton

PDF

Open Access

TL;DR

This paper introduces Centered Reward Distillation (CRD), a diffusion RL framework that improves fine-tuning of text-to-image models by controlling distribution drift and reducing reward hacking, achieving state-of-the-art results.

Contribution

CRD provides a novel diffusion RL method with within-prompt centering, decoupled sampling, KL anchoring, and adaptive KL strength to enhance fine-tuning stability and performance.

Findings

01

CRD achieves competitive SOTA reward optimization results.

02

CRD converges faster than previous methods.

03

CRD reduces reward hacking in text-to-image fine-tuning.

Abstract

Diffusion and flow models achieve State-Of-The-Art (SOTA) generative performance, yet many practically important behaviors such as fine-grained prompt fidelity, compositional correctness, and text rendering are weakly specified by score or flow matching pretraining objectives. Reinforcement Learning (RL) fine-tuning with external, black-box rewards is a natural remedy, but diffusion RL is often brittle. Trajectory-based methods incur high memory cost and high-variance gradient estimates; forward-process approaches converge faster but can suffer from distribution drift, and hence reward hacking. In this work, we present \textbf{Centered Reward Distillation (CRD)}, a diffusion RL framework derived from KL-regularized reward maximization built on forward-process-based fine-tuning. The key insight is that the intractable normalizing constant cancels under \emph{within-prompt centering},…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis