State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement   Learning

Chen Chen; Hongyao Tang; Yi Ma; Chao Wang; Qianli Shen; Dong Li,; Jianye Hao

arXiv:2211.15065·cs.LG·November 29, 2022

State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning

Chen Chen, Hongyao Tang, Yi Ma, Chao Wang, Qianli Shen, Dong Li,, Jianye Hao

PDF

Open Access

TL;DR

This paper introduces a novel offline RL algorithm, SA-PP, which adaptively applies pessimism based on state distribution ratios, leading to improved performance over existing methods.

Contribution

The paper proposes a state-aware framework for offline RL that modulates behavior regularization using stationary state distribution ratios, with theoretical and empirical validation.

Findings

01

SA-CQL outperforms baselines on multiple benchmarks.

02

SA-PP provides a lower suboptimality upper bound.

03

Extensive experiments demonstrate the effectiveness of the approach.

Abstract

Pessimism is of great importance in offline reinforcement learning (RL). One broad category of offline RL algorithms fulfills pessimism by explicit or implicit behavior regularization. However, most of them only consider policy divergence as behavior regularization, ignoring the effect of how the offline state distribution differs with that of the learning policy, which may lead to under-pessimism for some states and over-pessimism for others. Taking account of this problem, we propose a principled algorithmic framework for offline RL, called \emph{State-Aware Proximal Pessimism} (SA-PP). The key idea of SA-PP is leveraging discounted stationary state distribution ratios between the learning policy and the offline dataset to modulate the degree of behavior regularization in a state-wise manner, so that pessimism can be implemented in a more appropriate way. We first provide theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Multi-Objective Optimization Algorithms