DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
Longxiang He, Li Shen, Linrui Zhang, Junbo Tan, Xueqian Wang

TL;DR
DiffCPS introduces a diffusion model-based approach for constrained policy search in offline reinforcement learning, overcoming limitations of Gaussian policies and enabling better policy expressivity with theoretical guarantees.
Contribution
The paper proposes DiffCPS, a novel diffusion model-based constrained policy search method with a primal-dual framework and theoretical analysis of duality and convergence.
Findings
DiffCPS outperforms traditional AWR-based methods on D4RL benchmarks.
DiffCPS achieves competitive or superior results compared to recent diffusion-based offline RL methods.
Theoretical analysis confirms strong duality and convergence properties of DiffCPS.
Abstract
Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR). However, previous methods may still encounter out-of-distribution actions due to the limited expressivity of Gaussian-based policies. On the other hand, directly applying the state-of-the-art models with distribution expression capabilities (i.e., diffusion models) in the AWR framework is intractable since AWR requires exact policy probability densities, which is intractable in diffusion models. In this paper, we propose a novel approach, (dubbed DiffCPS), which tackles the diffusion-based constrained policy search with the primal-dual method. The theoretical analysis reveals that strong duality holds for diffusion-based CPS problems, and upon introducing parameter approximation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification
MethodsDiffusion
