Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey

Preeti Lamba; Kiran Ravish; Ankita Kushwaha; and Pawan Kumar

arXiv:2505.17352·cs.CV·May 19, 2026

Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey

Preeti Lamba, Kiran Ravish, Ankita Kushwaha, and Pawan Kumar

PDF

TL;DR

This survey reviews recent methods for aligning diffusion models with safety and preference criteria, emphasizing reinforcement learning, reward modeling, and safety considerations.

Contribution

It provides a comprehensive organization of the literature on diffusion model alignment, highlighting methodological gaps and open challenges.

Findings

01

Reviewed reinforcement learning from human feedback for diffusion models

02

Compared methods based on feedback requirements and safety robustness

03

Identified open challenges like multi-objective alignment and interpretability

Abstract

Diffusion models have become a central paradigm for image and multimodal generation, yet their deployment raises persistent questions about alignment, safety, preference satisfaction, and robustness to misuse. This survey reviews recent progress on aligning text-to-image diffusion models through reinforcement learning, reward modeling, preference optimization, and safety-specific fine-tuning. We organize the literature along five axes: the source of feedback, the form of the reward or preference signal, the optimization mechanism, the treatment of distribution shift and reward overoptimization, and the extent to which safety is addressed as an explicit constraint rather than a generic preference. The review covers reinforcement learning from human feedback, KL-regularized policy optimization, direct preference optimization, binary utility optimization, differentiable reward fine-tuning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · ALIGN