Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement   Learning

Tianle Zhang; Jiayi Guan; Lin Zhao; Yihang Li; Dongjiang Li; Zecui; Zeng; Lei Sun; Yue Chen; Xuelong Wei; Lusong Li; Xiaodong He

arXiv:2405.18729·cs.LG·May 30, 2024

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui, Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He

PDF

Open Access

TL;DR

This paper introduces a preferred-action-optimized diffusion policy for offline RL that leverages a conditional diffusion model and anti-noise optimization to improve policy performance, especially in sparse reward tasks.

Contribution

It proposes a novel diffusion-based offline RL method with preferred-action optimization and anti-noise training, enhancing policy diversity and stability.

Findings

01

Achieves superior performance on sparse reward tasks like Kitchen and AntMaze.

02

Demonstrates the effectiveness of anti-noise preference optimization.

03

Outperforms previous state-of-the-art offline RL methods.

Abstract

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach optimizes the policy only using the collected actions and is sensitive to Q-values, which limits the potential for further performance enhancement. To this end, we propose a novel preferred-action-optimized diffusion policy for offline RL. In particular, an expressive conditional diffusion model is utilized to represent the diverse distribution of a behavior policy. Meanwhile, based on the diffusion model, preferred actions within the same behavior distribution are automatically generated through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion