SelfBC: Self Behavior Cloning for Offline Reinforcement Learning
Shirong Liu, Chenjia Bai, Zixian Guo, Hao Zhang, Gaurav Sharma and, Yang Liu

TL;DR
SelfBC introduces a dynamic self-constraint mechanism in offline reinforcement learning that enables learning more effective policies by adaptively restricting the policy based on an exponential moving average of past policies, outperforming existing methods.
Contribution
The paper proposes a novel dynamic policy constraint using an exponential moving average of policies, improving over static constraints in offline RL.
Findings
Achieves state-of-the-art performance on D4RL MuJoCo tasks.
Demonstrates nearly monotonic improvement of the reference policy.
Effectively balances policy flexibility and stability in offline RL.
Abstract
Policy constraint methods in offline reinforcement learning employ additional regularization techniques to constrain the discrepancy between the learned policy and the offline dataset. However, these methods tend to result in overly conservative policies that resemble the behavior policy, thus limiting their performance. We investigate this limitation and attribute it to the static nature of traditional constraints. In this paper, we propose a novel dynamic policy constraint that restricts the learned policy on the samples generated by the exponential moving average of previously learned policies. By integrating this self-constraint mechanism into off-policy methods, our method facilitates the learning of non-conservative policies while avoiding policy collapse in the offline setting. Theoretical results show that our approach results in a nearly monotonically improved reference policy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Mental Health Research Topics
