Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient
Zechu Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal,, Georgia Chalvatzaki

TL;DR
This paper introduces DDiffPG, a novel reinforcement learning algorithm that learns multimodal policies from scratch using diffusion models, enabling versatile behaviors and explicit mode control in complex tasks.
Contribution
The paper proposes DDiffPG, combining diffusion models with mode-specific Q-learning and clustering to learn and maintain diverse multimodal policies in online RL.
Findings
Successfully learns multimodal behaviors in high-dimensional tasks
Enables explicit mode control via mode-specific embeddings
Demonstrates online replanning in maze navigation
Abstract
Deep reinforcement learning (RL) algorithms typically parameterize the policy as a deep network that outputs either a deterministic action or a stochastic one modeled as a Gaussian distribution, hence restricting learning to a single behavioral mode. Meanwhile, diffusion models emerged as a powerful framework for multimodal learning. However, the use of diffusion policies in online RL is hindered by the intractability of policy likelihood approximation, as well as the greedy objective of RL methods that can easily skew the policy to a single mode. This paper presents Deep Diffusion Policy Gradient (DDiffPG), a novel actor-critic algorithm that learns from scratch multimodal policies parameterized as diffusion models while discovering and maintaining versatile behaviors. DDiffPG explores and discovers multiple modes through off-the-shelf unsupervised clustering combined with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Network Security and Intrusion Detection
