Diffusion-based Reinforcement Learning via Q-weighted Variational Policy   Optimization

Shutong Ding; Ke Hu; Zhenhao Zhang; Kan Ren; Weinan Zhang; Jingyi Yu,; Jingya Wang; Ye Shi

arXiv:2405.16173·cs.LG·December 17, 2024

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization

Shutong Ding, Ke Hu, Zhenhao Zhang, Kan Ren, Weinan Zhang, Jingyi Yu,, Jingya Wang, Ye Shi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces QVPO, a novel diffusion-based online reinforcement learning algorithm that enhances exploration and multimodality, achieving state-of-the-art results on MuJoCo benchmarks.

Contribution

It proposes a new Q-weighted variational loss for diffusion policies in online RL, addressing optimization challenges and improving exploration and sample efficiency.

Findings

01

QVPO achieves state-of-the-art performance on MuJoCo benchmarks.

02

The Q-weighted variational loss provides a tight lower bound for policy optimization.

03

Enhanced exploration capabilities lead to better cumulative rewards.

Abstract

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL algorithms in continuous control tasks by overcoming the limitations of unimodal policies, such as Gaussian policies, and providing the agent with enhanced exploration capabilities. However, existing works mainly focus on the application of diffusion policies in offline RL, while their incorporation into online RL is less investigated. The training objective of the diffusion model, known as the variational lower bound, cannot be optimized directly in online RL due to the unavailability of 'good' actions. This leads to difficulties in conducting diffusion policy improvement. To overcome this, we propose a novel model-free diffusion-based online RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wadx2019/qvpo
pytorchOfficial

Videos

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization· slideslive

Taxonomy

TopicsElevator Systems and Control · Traffic control and management

MethodsDiffusion · Entropy Regularization · Focus