Modular Diffusion Policy Training: Decoupling and Recombining Guidance and Diffusion for Offline RL
Zhaoyang Chen, Cody Fleming

TL;DR
This paper introduces a modular approach to diffusion policy training in offline reinforcement learning, decoupling guidance from the diffusion model to improve efficiency, transferability, and performance.
Contribution
It proposes guidance-first diffusion training and demonstrates guidance module transferability across algorithms, enhancing offline RL methods with modular, reusable components.
Findings
Guidance-first training improves sample efficiency and performance.
Independent guidance modules can be transferred across algorithms with minimal performance loss.
Decoupling guidance and diffusion reduces memory usage and computational costs.
Abstract
Classifier free guidance has shown strong potential in diffusion-based reinforcement learning. However, existing methods rely on joint training of the guidance module and the diffusion model, which can be suboptimal during the early stages when the guidance is inaccurate and provides noisy learning signals. In offline RL, guidance depends solely on offline data: observations, actions, and rewards, and is independent of the policy module's behavior, suggesting that joint training is not required. This paper proposes modular training methods that decouple the guidance module from the diffusion model, based on three key findings: Guidance Necessity: We explore how the effectiveness of guidance varies with the training stage and algorithm choice, uncovering the roles of guidance and diffusion. A lack of good guidance in the early stage presents an opportunity for optimization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
