Stabilizing Transformer-Based Action Sequence Generation For Q-Learning
Gideon Stein, Andrey Filchenkov, Arip Asadulaev

TL;DR
This paper introduces a stable Transformer-based Deep Q-Learning method that performs comparably to classic Q-learning across various environments and offers insights into integrating Transformers with Reinforcement Learning.
Contribution
It presents a novel, stable Transformer-based Deep Q-Learning approach and provides a comprehensive evaluation and insights into its relation with RL.
Findings
Matches classic Q-learning performance on control environments
Shows potential on Atari benchmarks
Provides insights into Transformer-RL integration
Abstract
Since the publication of the original Transformer architecture (Vaswani et al. 2017), Transformers revolutionized the field of Natural Language Processing. This, mainly due to their ability to understand timely dependencies better than competing RNN-based architectures. Surprisingly, this architecture change does not affect the field of Reinforcement Learning (RL), even though RNNs are quite popular in RL, and time dependencies are very common in RL. Recently, Parisotto et al. 2019) conducted the first promising research of Transformers in RL. To support the findings of this work, this paper seeks to provide an additional example of a Transformer-based RL method. Specifically, the goal is a simple Transformer-based Deep Q-Learning method that is stable over several environments. Due to the unstable nature of Transformers and RL, an extensive method search was conducted to arrive at a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Software Engineering Research · Evolutionary Algorithms and Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Layer Normalization · Byte Pair Encoding · Softmax · Adam · Dense Connections
