Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning

Xinyao Liao; Wei Wei; Xiaoye Qu; Yu Cheng

arXiv:2505.19196·cs.CV·May 27, 2025

Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning

Xinyao Liao, Wei Wei, Xiaoye Qu, Yu Cheng

PDF

1 Repo

TL;DR

This paper introduces a step-level reward framework for RL-based fine-tuning of text-to-image diffusion models, improving training efficiency and generalization by dynamically assigning dense rewards to denoising steps based on image similarity changes.

Contribution

It proposes a simple credit assignment method that distributes dense rewards across denoising steps without extra neural networks, enhancing sample efficiency and generalization.

Findings

01

Achieves 1.25 to 2 times higher sample efficiency.

02

Improves generalization across multiple human preference rewards.

03

Maintains the original optimal policy quality.

Abstract

Recent advances in text-to-image (T2I) diffusion model fine-tuning leverage reinforcement learning (RL) to align generated images with learnable reward functions. The existing approaches reformulate denoising as a Markov decision process for RL-driven optimization. However, they suffer from reward sparsity, receiving only a single delayed reward per generated trajectory. This flaw hinders precise step-level attribution of denoising actions, undermines training efficiency. To address this, we propose a simple yet effective credit assignment framework that dynamically distributes dense rewards across denoising steps. Specifically, we track changes in cosine similarity between intermediate and final images to quantify each step's contribution on progressively reducing the distance to the final image. Our approach avoids additional auxiliary neural networks for step-level preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lil-shake/coca
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · ALIGN