Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

Jaemoo Choi; Yuchen Zhu; Wei Guo; Petr Molodyk; Bo Yuan; Jinbin Bai; Yi Xin; Molei Tao; Yongxin Chen

arXiv:2602.04663·cs.LG·May 20, 2026

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

Jaemoo Choi, Yuchen Zhu, Wei Guo, Petr Molodyk, Bo Yuan, Jinbin Bai, Yi Xin, Molei Tao, Yongxin Chen

PDF

TL;DR

This paper systematically analyzes reinforcement learning for diffusion models, highlighting the importance of likelihood estimation and demonstrating significant efficiency and performance improvements across benchmarks.

Contribution

It reveals that using an ELBO-based likelihood estimator from the final sample is crucial for effective RL optimization, surpassing the influence of the policy-gradient loss.

Findings

01

ELBO-based likelihood estimator improves RL stability and efficiency

02

Method increases GenEval score from 0.24 to 0.95

03

Achieves 4.6x efficiency over FlowGRPO and 2x over DiffusionNFT

Abstract

Reinforcement learning has been widely applied to diffusion and flow models for visual tasks such as text-to-image generation. However, these tasks remain challenging because diffusion models have intractable likelihoods, which creates a barrier for directly applying popular policy-gradient type methods. Existing approaches primarily focus on crafting new objectives built on already heavily engineered LLM objectives, using ad hoc estimators for likelihood, without a thorough investigation into how such estimation affects overall algorithmic performance. In this work, we provide a systematic analysis of the RL design space by disentangling three factors: i) policy-gradient objectives, ii) likelihood estimators, and iii) rollout sampling schemes. We show that adopting an evidence lower bound (ELBO) based model likelihood estimator, computed only from the final generated sample, is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis