Offline Reinforcement Learning with Fisher Divergence Critic Regularization
Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

TL;DR
This paper introduces Fisher-BRC, a novel offline RL algorithm that regularizes the critic using Fisher divergence, leading to improved performance and faster convergence on benchmark tasks.
Contribution
It proposes a new critic regularization method based on Fisher divergence, connecting energy-based models with offline RL, and demonstrates its effectiveness.
Findings
Fisher-BRC outperforms existing methods on standard benchmarks.
The approach achieves faster convergence.
It effectively maintains policy proximity to offline data.
Abstract
Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Adaptive Dynamic Programming Control
MethodsFisher-BRC
