Constrained Policy Optimization with Explicit Behavior Density for   Offline Reinforcement Learning

Jing Zhang; Chi Zhang; Wenjia Wang; Bing-Yi Jing

arXiv:2301.12130·cs.LG·March 6, 2024·1 cites

Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

Jing Zhang, Chi Zhang, Wenjia Wang, Bing-Yi Jing

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CPED, a novel offline RL method that explicitly estimates behavior policy density using flow-GAN, enabling more accurate and less conservative policy optimization within safe regions.

Contribution

The paper proposes a new offline RL approach, CPED, which uses flow-GAN to explicitly estimate behavior density, improving safety and performance over existing methods.

Findings

01

CPED outperforms existing methods on standard offline RL benchmarks.

02

Theoretical guarantees for the flow-GAN estimator and CPED's performance.

03

CPED achieves higher expected returns with less conservative policies.

Abstract

Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points. Existing methods for addressing this issue either control policy to exclude the OOD action or make the $Q$ function pessimistic. However, these methods can be overly conservative or fail to identify OOD areas accurately. To overcome this problem, we propose a Constrained Policy optimization with Explicit Behavior density (CPED) method that utilizes a flow-GAN model to explicitly estimate the density of behavior policy. By estimating the explicit density, CPED can accurately identify the safe region and enable optimization within the region, resulting in less conservative learning policies. We further provide theoretical results for both the flow-GAN estimator and performance guarantee for CPED by showing that CPED can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evalarzj/cped
pytorchOfficial

Videos

Constrained Policy Optimization with Explicit Behavior Density For Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Mental Health Research Topics · Embodied and Extended Cognition