Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Ningyuan Yang; Jiaxuan Gao; Feng Gao; Yi Wu; Chao Yu

arXiv:2505.10482·cs.LG·September 30, 2025

Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps

Ningyuan Yang, Jiaxuan Gao, Feng Gao, Yi Wu, Chao Yu

PDF

Open Access

TL;DR

This paper introduces NCDPO, a novel framework that enables efficient fine-tuning of diffusion policies by backpropagating through diffusion timesteps, significantly improving sample efficiency and performance in decision-making tasks.

Contribution

NCDPO reformulates diffusion policies as noise-conditioned deterministic policies, allowing tractable likelihood evaluation and gradient backpropagation through all diffusion steps.

Findings

01

NCDPO achieves sample efficiency comparable to PPO on MLP policies from scratch.

02

NCDPO outperforms existing methods in sample efficiency and final performance.

03

The method is robust to the number of denoising timesteps.

Abstract

Diffusion policies, widely adopted in decision-making scenarios such as robotics, gaming and autonomous driving, are capable of learning diverse skills from demonstration data due to their high representation power. However, the sub-optimal and limited coverage of demonstration data could lead to diffusion policies that generate sub-optimal trajectories and even catastrophic failures. While reinforcement learning (RL)-based fine-tuning has emerged as a promising solution to address these limitations, existing approaches struggle to effectively adapt Proximal Policy Optimization (PPO) to diffusion models. This challenge stems from the computational intractability of action likelihood estimation during the denoising process, which leads to complicated optimization objectives. In our experiments starting from randomly initialized policies, we find that online tuning of Diffusion Policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in Photolithography Techniques

MethodsDiffusion · Entropy Regularization · Proximal Policy Optimization