SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Shirong Liu; Chenjia Bai; Zixian Guo; Hao Zhang; Gaurav Sharma and; Yang Liu

arXiv:2408.02165·cs.LG·August 6, 2024

SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Shirong Liu, Chenjia Bai, Zixian Guo, Hao Zhang, Gaurav Sharma and, Yang Liu

PDF

Open Access

TL;DR

SelfBC introduces a dynamic self-constraint mechanism in offline reinforcement learning that enables learning more effective policies by adaptively restricting the policy based on an exponential moving average of past policies, outperforming existing methods.

Contribution

The paper proposes a novel dynamic policy constraint using an exponential moving average of policies, improving over static constraints in offline RL.

Findings

01

Achieves state-of-the-art performance on D4RL MuJoCo tasks.

02

Demonstrates nearly monotonic improvement of the reference policy.

03

Effectively balances policy flexibility and stability in offline RL.

Abstract

Policy constraint methods in offline reinforcement learning employ additional regularization techniques to constrain the discrepancy between the learned policy and the offline dataset. However, these methods tend to result in overly conservative policies that resemble the behavior policy, thus limiting their performance. We investigate this limitation and attribute it to the static nature of traditional constraints. In this paper, we propose a novel dynamic policy constraint that restricts the learned policy on the samples generated by the exponential moving average of previously learned policies. By integrating this self-constraint mechanism into off-policy methods, our method facilitates the learning of non-conservative policies while avoiding policy collapse in the offline setting. Theoretical results show that our approach results in a nearly monotonically improved reference policy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Mental Health Research Topics