Behavior Constraining in Weight Space for Offline Reinforcement Learning

Phillip Swazinna; Steffen Udluft; Daniel Hein; Thomas Runkler

arXiv:2107.05479·cs.LG·July 13, 2021

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler

PDF

1 Repo

TL;DR

This paper introduces a novel offline reinforcement learning algorithm that constrains policy behavior directly in weight space, improving learning from fixed datasets.

Contribution

The paper presents a new method for offline RL that constrains policies in weight space rather than action distribution divergence, showing promising experimental results.

Findings

01

Effective policy learning from fixed datasets

02

Outperforms divergence-based regularization methods

03

Demonstrates robustness in various environments

Abstract

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

siemens/industrialbenchmark
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.