Iteratively Refined Behavior Regularization for Offline Reinforcement Learning
Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao

TL;DR
This paper introduces an iterative refinement approach for behavior regularization in offline reinforcement learning, improving policy robustness and performance by gradually updating the reference policy to avoid out-of-sample actions.
Contribution
It proposes a novel iterative refinement algorithm based on conservative policy iteration that enhances behavior regularization in offline RL, with theoretical guarantees and practical improvements.
Findings
Outperforms state-of-the-art methods on D4RL benchmarks
Capable of learning the in-sample optimal policy in tabular settings
Easy to implement with minimal code modifications
Abstract
One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its ability to learn an effective control policy that seamlessly aligns with the inherent distribution of offline data. Unfortunately, behavior regularization, a simple yet effective offline RL algorithm, tends to struggle in this regard. In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. Our key observation is that by iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement, while also implicitly avoiding querying out-of-sample actions to prevent catastrophic learning failures. We prove that in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
