Offline Reinforcement Learning with Fisher Divergence Critic   Regularization

Ilya Kostrikov; Jonathan Tompson; Rob Fergus; Ofir Nachum

arXiv:2103.08050·cs.LG·March 16, 2021·25 cites

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces Fisher-BRC, a novel offline RL algorithm that regularizes the critic using Fisher divergence, leading to improved performance and faster convergence on benchmark tasks.

Contribution

It proposes a new critic regularization method based on Fisher divergence, connecting energy-based models with offline RL, and demonstrates its effectiveness.

Findings

01

Fisher-BRC outperforms existing methods on standard benchmarks.

02

The approach achieves faster convergence.

03

It effectively maintains policy proximity to offline data.

Abstract

Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Offline Reinforcement Learning with Fisher Divergence Critic Regularization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Adaptive Dynamic Programming Control

MethodsFisher-BRC