Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey
Preeti Lamba, Kiran Ravish, Ankita Kushwaha, and Pawan Kumar

TL;DR
This survey reviews recent methods for aligning diffusion models with safety and preference criteria, emphasizing reinforcement learning, reward modeling, and safety considerations.
Contribution
It provides a comprehensive organization of the literature on diffusion model alignment, highlighting methodological gaps and open challenges.
Findings
Reviewed reinforcement learning from human feedback for diffusion models
Compared methods based on feedback requirements and safety robustness
Identified open challenges like multi-objective alignment and interpretability
Abstract
Diffusion models have become a central paradigm for image and multimodal generation, yet their deployment raises persistent questions about alignment, safety, preference satisfaction, and robustness to misuse. This survey reviews recent progress on aligning text-to-image diffusion models through reinforcement learning, reward modeling, preference optimization, and safety-specific fine-tuning. We organize the literature along five axes: the source of feedback, the form of the reward or preference signal, the optimization mechanism, the treatment of distribution shift and reward overoptimization, and the extent to which safety is addressed as an explicit constraint rather than a generic preference. The review covers reinforcement learning from human feedback, KL-regularized policy optimization, direct preference optimization, binary utility optimization, differentiable reward fine-tuning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion · ALIGN
