Stabilizing Transformers for Reinforcement Learning
Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar, Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan, Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

TL;DR
This paper introduces Gated Transformer-XL, an improved transformer architecture tailored for reinforcement learning, which enhances stability, learning speed, and performance in memory-intensive and partially observable tasks.
Contribution
The authors propose architectural modifications to the Transformer-XL that significantly improve its stability and efficiency in reinforcement learning settings, outperforming LSTMs and previous architectures.
Findings
GTrXL surpasses LSTMs on memory environments.
Achieves state-of-the-art on DMLab-30 benchmark.
Offers a more expressive alternative to LSTMs for RL.
Abstract
Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting. In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. We propose architectural modifications that substantially improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Neural dynamics and brain function
MethodsLinear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Gated Recurrent Unit · Gated Transformer-XL · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Sigmoid Activation · Tanh Activation · Variational Dropout
