A Minimalist Approach to Offline Reinforcement Learning
Scott Fujimoto, Shixiang Shane Gu

TL;DR
This paper presents a minimalist offline RL method that adds a behavior cloning term and data normalization to a standard RL algorithm, achieving competitive performance with less complexity and computational cost.
Contribution
It introduces a simple, effective offline RL approach by minimal modifications to existing algorithms, avoiding complex components and hyperparameters.
Findings
Matches state-of-the-art offline RL performance
Halves runtime compared to complex methods
Simplifies implementation and tuning
Abstract
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing the policy with the actions contained in the dataset. Built on pre-existing RL algorithms, modifications to make an RL algorithm work offline comes at the cost of additional complexity. Offline RL algorithms introduce new hyperparameters and often leverage secondary components such as generative models, while adjusting the underlying RL algorithm. In this paper we aim to make a deep RL algorithm work while making minimal changes. We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data. The resulting algorithm is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Smart Grid Energy Management
