Diffusion Actor-Critic: Formulating Constrained Policy Iteration as   Diffusion Noise Regression for Offline Reinforcement Learning

Linjiajie Fang; Ruoxue Liu; Jing Zhang; Wenjia Wang; Bing-Yi Jing

arXiv:2405.20555·cs.LG·February 26, 2025

Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning

Linjiajie Fang, Ruoxue Liu, Jing Zhang, Wenjia Wang, Bing-Yi Jing

PDF

Open Access 1 Repo

TL;DR

This paper introduces Diffusion Actor-Critic (DAC), a novel offline reinforcement learning method that uses diffusion models to represent policies and enforce constraints, leading to improved stability and performance.

Contribution

It formulates constrained policy iteration as a diffusion noise regression problem, enabling direct diffusion-based policy representation and stable learning in offline RL.

Findings

01

Outperforms state-of-the-art methods on D4RL benchmarks.

02

Preserves policy multi-modality for better exploration.

03

Ensures stable convergence through diffusion-based regularization.

Abstract

In offline reinforcement learning, it is necessary to manage out-of-distribution actions to prevent overestimation of value functions. One class of methods, the policy-regularized method, addresses this problem by constraining the target policy to stay close to the behavior policy. Although several approaches suggest representing the behavior policy as an expressive diffusion model to boost performance, it remains unclear how to regularize the target policy given a diffusion-modeled behavior sampler. In this paper, we propose Diffusion Actor-Critic (DAC) that formulates the Kullback-Leibler (KL) constraint policy iteration as a diffusion noise regression problem, enabling direct representation of target policies as diffusion models. Our approach follows the actor-critic learning paradigm in which we alternatively train a diffusion-modeled target policy and a critic network. The actor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Fang-Lin93/DAC
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDynamic Algorithm Configuration · Diffusion