Iteratively Refined Behavior Regularization for Offline Reinforcement   Learning

Xiaohan Hu; Yi Ma; Chenjun Xiao; Yan Zheng; Jianye Hao

arXiv:2306.05726·cs.LG·October 18, 2023·1 cites

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao

PDF

Open Access 1 Video

TL;DR

This paper introduces an iterative refinement approach for behavior regularization in offline reinforcement learning, improving policy robustness and performance by gradually updating the reference policy to avoid out-of-sample actions.

Contribution

It proposes a novel iterative refinement algorithm based on conservative policy iteration that enhances behavior regularization in offline RL, with theoretical guarantees and practical improvements.

Findings

01

Outperforms state-of-the-art methods on D4RL benchmarks

02

Capable of learning the in-sample optimal policy in tabular settings

03

Easy to implement with minimal code modifications

Abstract

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its ability to learn an effective control policy that seamlessly aligns with the inherent distribution of offline data. Unfortunately, behavior regularization, a simple yet effective offline RL algorithm, tends to struggle in this regard. In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. Our key observation is that by iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement, while also implicitly avoiding querying out-of-sample actions to prevent catastrophic learning failures. We prove that in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning