Offline Reinforcement Learning as One Big Sequence Modeling Problem

Michael Janner; Qiyang Li; Sergey Levine

arXiv:2106.02039·cs.LG·November 30, 2021·41 cites

Offline Reinforcement Learning as One Big Sequence Modeling Problem

Michael Janner, Qiyang Li, Sergey Levine

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper proposes viewing reinforcement learning as a sequence modeling problem using Transformer architectures, enabling flexible planning and achieving state-of-the-art results in long-horizon, sparse-reward tasks.

Contribution

It introduces a novel approach that applies sequence modeling techniques, like Transformers, to RL, simplifying design and improving performance in complex tasks.

Findings

01

Effective long-horizon dynamics prediction

02

State-of-the-art offline RL planning results

03

Versatility across multiple RL settings

Abstract

Reinforcement learning (RL) is typically concerned with estimating stationary policies or single-step models, leveraging the Markov property to factorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence of high rewards. Viewed in this way, it is tempting to consider whether high-capacity sequence prediction models that work well in other domains, such as natural-language processing, can also provide effective solutions to the RL problem. To this end, we explore how RL can be tackled with the tools of sequence modeling, using a Transformer architecture to model distributions over trajectories and repurposing beam search as a planning algorithm. Framing RL as sequence modeling problem simplifies a range of design decisions, allowing us to dispense with many of the components…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Offline Reinforcement Learning as One Big Sequence Modeling Problem· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Software Engineering Research · AI-based Problem Solving and Planning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Layer Normalization · Residual Connection · Dense Connections