Efficient Online Reinforcement Learning for Diffusion Policy

Haitong Ma; Tianyi Chen; Kai Wang; Na Li; Bo Dai

arXiv:2502.00361·cs.LG·July 1, 2025

Efficient Online Reinforcement Learning for Diffusion Policy

Haitong Ma, Tianyi Chen, Kai Wang, Na Li, Bo Dai

PDF

Open Access 1 Video

TL;DR

This paper introduces Reweighted Score Matching, a novel method enabling efficient online reinforcement learning with diffusion policies, avoiding costly sampling and improving performance on MuJoCo benchmarks.

Contribution

The paper proposes Reweighted Score Matching for diffusion policies, enabling scalable online RL without sampling from the target distribution, and introduces two algorithms, DPMD and SDAC.

Findings

01

DPMD improves over 120% on Humanoid and Ant tasks.

02

Proposed algorithms outperform recent diffusion-policy online RL methods.

03

Reweighted Score Matching reduces computational costs and stabilizes training.

Abstract

Diffusion policies have achieved superior performance in imitation learning and offline reinforcement learning (RL) due to their rich expressiveness. However, the conventional diffusion training procedure requires samples from target distribution, which is impossible in online RL since we cannot sample from the optimal policy. Backpropagating policy gradient through the diffusion process incurs huge computational costs and instability, thus being expensive and not scalable. To enable efficient training of diffusion policies in online RL, we generalize the conventional denoising score matching by reweighting the loss function. The resulting Reweighted Score Matching (RSM) preserves the optimal solution and low computational cost of denoising score matching, while eliminating the need to sample from the target distribution and allowing learning to optimize value functions. We introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Efficient Online Reinforcement Learning for Diffusion Policy· slideslive

Taxonomy

TopicsNeural Networks Stability and Synchronization