Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu

TL;DR
This paper introduces NCDPO, a novel framework that enables efficient fine-tuning of diffusion policies by backpropagating through diffusion timesteps, significantly improving sample efficiency and performance in decision-making tasks.
Contribution
NCDPO reformulates diffusion policies as noise-conditioned deterministic policies, allowing tractable likelihood evaluation and gradient backpropagation through all diffusion steps.
Findings
NCDPO achieves sample efficiency comparable to PPO on MLP policies from scratch.
NCDPO outperforms existing methods in sample efficiency and final performance.
The method is robust to the number of denoising timesteps.
Abstract
Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal and limited coverage of demonstration data could lead to diffusion policies that generate sub-optimal trajectories and even catastrophic failures. While reinforcement learning (RL)-based fine-tuning has emerged as a promising solution to address these limitations, existing approaches struggle to effectively adapt Proximal Policy Optimization (PPO) to diffusion models. This challenge stems from the computational intractability of action likelihood estimation during the denoising process, which leads to complicated optimization objectives. In our experiments starting from randomly initialized policies, we find that online tuning of Diffusion Policies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Photolithography Techniques
MethodsDiffusion · Entropy Regularization · Proximal Policy Optimization
