TL;DR
This paper introduces a novel offline reinforcement learning algorithm that constrains policy behavior directly in weight space, improving learning from fixed datasets.
Contribution
The paper presents a new method for offline RL that constrains policies in weight space rather than action distribution divergence, showing promising experimental results.
Findings
Effective policy learning from fixed datasets
Outperforms divergence-based regularization methods
Demonstrates robustness in various environments
Abstract
In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
