Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto; H. Francis Song; Jack W. Rae; Razvan Pascanu; Caglar; Gulcehre; Siddhant M. Jayakumar; Max Jaderberg; Raphael Lopez Kaufman; Aidan; Clark; Seb Noury; Matthew M. Botvinick; Nicolas Heess; Raia Hadsell

arXiv:1910.06764·cs.LG·October 16, 2019·131 cites

Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar, Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan, Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

PDF

Open Access 5 Repos 1 Models 1 Video

TL;DR

This paper introduces Gated Transformer-XL, an improved transformer architecture tailored for reinforcement learning, which enhances stability, learning speed, and performance in memory-intensive and partially observable tasks.

Contribution

The authors propose architectural modifications to the Transformer-XL that significantly improve its stability and efficiency in reinforcement learning settings, outperforming LSTMs and previous architectures.

Findings

01

GTrXL surpasses LSTMs on memory environments.

02

Achieves state-of-the-art on DMLab-30 benchmark.

03

Offers a more expressive alternative to LSTMs for RL.

Abstract

Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting. In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. We propose architectural modifications that substantially improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
LilHairdy/cleanrl_memory_gym
model

Videos

Stabilizing Transformers for Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Neural dynamics and brain function

MethodsLinear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Gated Recurrent Unit · Gated Transformer-XL · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Sigmoid Activation · Tanh Activation · Variational Dropout