Stabilizing Transformer-Based Action Sequence Generation For Q-Learning

Gideon Stein; Andrey Filchenkov; Arip Asadulaev

arXiv:2010.12698·cs.LG·December 21, 2020·1 cites

Stabilizing Transformer-Based Action Sequence Generation For Q-Learning

Gideon Stein, Andrey Filchenkov, Arip Asadulaev

PDF

Open Access

TL;DR

This paper introduces a stable Transformer-based Deep Q-Learning method that performs comparably to classic Q-learning across various environments and offers insights into integrating Transformers with Reinforcement Learning.

Contribution

It presents a novel, stable Transformer-based Deep Q-Learning approach and provides a comprehensive evaluation and insights into its relation with RL.

Findings

01

Matches classic Q-learning performance on control environments

02

Shows potential on Atari benchmarks

03

Provides insights into Transformer-RL integration

Abstract

Since the publication of the original Transformer architecture (Vaswani et al. 2017), Transformers revolutionized the field of Natural Language Processing. This, mainly due to their ability to understand timely dependencies better than competing RNN-based architectures. Surprisingly, this architecture change does not affect the field of Reinforcement Learning (RL), even though RNNs are quite popular in RL, and time dependencies are very common in RL. Recently, Parisotto et al. 2019) conducted the first promising research of Transformers in RL. To support the findings of this work, this paper seeks to provide an additional example of a Transformer-based RL method. Specifically, the goal is a simple Transformer-based Deep Q-Learning method that is stable over several environments. Due to the unstable nature of Transformers and RL, an extensive method search was conducted to arrive at a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Software Engineering Research · Evolutionary Algorithms and Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Layer Normalization · Byte Pair Encoding · Softmax · Adam · Dense Connections