Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg; Abbas Abdolmaleki; Jingwei Zhang; Oliver; Groth; Michael Bloesch; Thomas Lampe; Philemon Brakel; Sarah Bechtle; Steven; Kapturowski; Roland Hafner; Nicolas Heess; Martin Riedmiller

arXiv:2402.05546·cs.LG·February 9, 2024·2 cites

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver, Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven, Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

PDF

Open Access

TL;DR

This paper demonstrates that offline actor-critic reinforcement learning can effectively scale to large models like transformers, outperforming behavioral cloning on multi-task control tasks and enabling multi-domain mastery from sub-optimal data.

Contribution

It introduces a scalable offline actor-critic method with a Perceiver-based architecture, showing its effectiveness across large models and complex multi-task environments.

Findings

01

Offline actor-critic scales similarly to supervised learning.

02

Outperforms behavioral cloning on 132 control tasks.

03

Enables multi-task learning from sub-optimal demonstrations.

Abstract

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics