Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Chen-Xiao Gao; Chenyang Wu; Mingjun Cao; Chenjun Xiao; Yang Yu; Zongzhang Zhang

arXiv:2502.04778·cs.LG·May 30, 2025

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning

Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Chenjun Xiao, Yang Yu, Zongzhang Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces BDPO, a novel offline RL framework that applies behavior regularization to diffusion-based policies, enhancing policy robustness and expressiveness, validated through synthetic and benchmark tasks.

Contribution

It extends behavior-regularized RL to diffusion policies by deriving an analytical KL regularization, enabling effective policy optimization with advanced parameterizations.

Findings

01

BDPO outperforms existing methods on synthetic 2D tasks.

02

BDPO achieves superior results on D4RL continuous control benchmarks.

03

The framework effectively balances policy expressiveness and safety.

Abstract

Behavior regularization, which constrains the policy to stay close to some behavior policy, is widely used in offline reinforcement learning (RL) to manage the risk of hazardous exploitation of unseen actions. Nevertheless, existing literature on behavior-regularized RL primarily focuses on explicit policy parameterizations, such as Gaussian policies. Consequently, it remains unclear how to extend this framework to more advanced policy parameterizations, such as diffusion models. In this paper, we introduce BDPO, a principled behavior-regularized RL framework tailored for diffusion-based policies, thereby combining the expressive power of diffusion policies and the robustness provided by regularization. The key ingredient of our method is to calculate the Kullback-Leibler (KL) regularization analytically as the accumulated discrepancies in reverse-time transition kernels along the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsElevator Systems and Control · Reinforcement Learning in Robotics · Traffic control and management

MethodsDiffusion · Focus