Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina, Fragkiadaki

TL;DR
This paper introduces AlignProp, a novel method for fine-tuning text-to-image diffusion models by backpropagating reward signals through the denoising process, improving alignment with desired objectives efficiently.
Contribution
AlignProp enables end-to-end reward backpropagation in diffusion models using low-rank adapters and gradient checkpointing, simplifying and enhancing reward-based fine-tuning.
Findings
Achieves higher reward scores faster than existing methods.
Requires less memory and training steps due to efficient design.
Effectively improves image-text alignment, aesthetics, and controllability.
Abstract
Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image…
Peer Reviews
Decision·Submitted to ICLR 2024
- The paper studies an important problem of end-to-end backpropagating a reward function through the denoising process. - The presented results look promising and the experiments are extensive and convincing.
- Clarification: in eq 3, does the first term come from weight decay? - Typos: 1) eq 3 and 4, cdot notations are not consistent; 2) page 5 "policy \pi_{theta}"; 3) page 5 "k"m". - Figure 3 presents visual results on a single image, which seems not "comprehensive" enough to study the impact of value of K (as stated in the last paragraph in page 5).
The experiments conducted are comprehensive and of high quality.
1. **Originality & Novelty:** The paper seems to lack significant originality and novelty. Implementing the two memory-saving techniques - finetuning with LoRA and gradient checkpointing - does not appear challenging, especially since they are already available in the “diffusers” package. Further, randomizing the number of denoising steps appears to be a straightforward approach, and it's not guaranteed to address the collapsing issue. 2. **Previous Work Reference:** The concept of using a diff
- The authors construct this paper with clear architecture and detailed discussion. - The proposed method is fairly simple with extensive experiment results. - The motivation is relatively clear and consistent with the human intuition.
- Novelty would be a controversial problem of this paper: - Methodology: The main technical components of this paper consist of two parts: 1) directly propagating the reward back to the diffusion models without RL, following DDPO, and 2) spanning the whole denoising process and perform back propagation through time, which has been utilized for diffusion model guidance / alignment early in DiffusionCLIP published in CVPR 2022. - Implementation: According to the authors, the main difficulty of
Code & Models
- 🤗alibaba-pai/CogVideoX-Fun-V1.1-Reward-LoRAsmodel· 237 dl· ♡ 60237 dl♡ 60
- 🤗alibaba-pai/Wan2.2-Fun-Reward-LoRAsmodel· 28k dl· ♡ 6528k dl♡ 65
- 🤗alibaba-pai/EasyAnimateV5-Reward-LoRAsmodel· ♡ 1♡ 1
- 🤗alibaba-pai/CogVideoX-Fun-V1.5-Reward-LoRAsmodel· 97 dl· ♡ 597 dl♡ 5
- 🤗alibaba-pai/Wan2.1-Fun-Reward-LoRAsmodel· 2.8k dl· ♡ 602.8k dl♡ 60
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsDiffusion · Adapter
