Learning Multimodal Behaviors from Scratch with Diffusion Policy   Gradient

Zechu Li; Rickmer Krohn; Tao Chen; Anurag Ajay; Pulkit Agrawal,; Georgia Chalvatzaki

arXiv:2406.00681·cs.LG·June 4, 2024

Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient

Zechu Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal,, Georgia Chalvatzaki

PDF

Open Access 1 Video

TL;DR

This paper introduces DDiffPG, a novel reinforcement learning algorithm that learns multimodal policies from scratch using diffusion models, enabling versatile behaviors and explicit mode control in complex tasks.

Contribution

The paper proposes DDiffPG, combining diffusion models with mode-specific Q-learning and clustering to learn and maintain diverse multimodal policies in online RL.

Findings

01

Successfully learns multimodal behaviors in high-dimensional tasks

02

Enables explicit mode control via mode-specific embeddings

03

Demonstrates online replanning in maze navigation

Abstract

Deep reinforcement learning (RL) algorithms typically parameterize the policy as a deep network that outputs either a deterministic action or a stochastic one modeled as a Gaussian distribution, hence restricting learning to a single behavioral mode. Meanwhile, diffusion models emerged as a powerful framework for multimodal learning. However, the use of diffusion policies in online RL is hindered by the intractability of policy likelihood approximation, as well as the greedy objective of RL methods that can easily skew the policy to a single mode. This paper presents Deep Diffusion Policy Gradient (DDiffPG), a novel actor-critic algorithm that learns from scratch multimodal policies parameterized as diffusion models while discovering and maintaining versatile behaviors. DDiffPG explores and discovers multiple modes through off-the-shelf unsupervised clustering combined with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Network Security and Intrusion Detection