State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning
Chen Chen, Hongyao Tang, Yi Ma, Chao Wang, Qianli Shen, Dong Li,, Jianye Hao

TL;DR
This paper introduces a novel offline RL algorithm, SA-PP, which adaptively applies pessimism based on state distribution ratios, leading to improved performance over existing methods.
Contribution
The paper proposes a state-aware framework for offline RL that modulates behavior regularization using stationary state distribution ratios, with theoretical and empirical validation.
Findings
SA-CQL outperforms baselines on multiple benchmarks.
SA-PP provides a lower suboptimality upper bound.
Extensive experiments demonstrate the effectiveness of the approach.
Abstract
Pessimism is of great importance in offline reinforcement learning (RL). One broad category of offline RL algorithms fulfills pessimism by explicit or implicit behavior regularization. However, most of them only consider policy divergence as behavior regularization, ignoring the effect of how the offline state distribution differs with that of the learning policy, which may lead to under-pessimism for some states and over-pessimism for others. Taking account of this problem, we propose a principled algorithmic framework for offline RL, called \emph{State-Aware Proximal Pessimism} (SA-PP). The key idea of SA-PP is leveraging discounted stationary state distribution ratios between the learning policy and the offline dataset to modulate the degree of behavior regularization in a state-wise manner, so that pessimism can be implemented in a more appropriate way. We first provide theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Multi-Objective Optimization Algorithms
