QT-TDM: Planning With Transformer Dynamics Model and Autoregressive Q-Learning
Mostafa Kotb, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter

TL;DR
This paper introduces QT-TDM, a novel reinforcement learning approach combining Transformer-based environment modeling with autoregressive Q-learning, enhancing long-term planning and efficiency in continuous control tasks.
Contribution
The paper proposes integrating Transformer Dynamics Models with a Q-Transformer for improved long-horizon planning and computational efficiency in RL.
Findings
QT-TDM outperforms existing Transformer-based RL models in performance.
It achieves higher sample efficiency in continuous control tasks.
The method enables fast, real-time inference with reduced computational costs.
Abstract
Inspired by the success of the Transformer architecture in natural language processing and computer vision, we investigate the use of Transformers in Reinforcement Learning (RL), specifically in modeling the environment's dynamics using Transformer Dynamics Models (TDMs). We evaluate the capabilities of TDMs for continuous control in real-time planning scenarios with Model Predictive Control (MPC). While Transformers excel in long-horizon prediction, their tokenization mechanism and autoregressive nature lead to costly planning over long horizons, especially as the environment's dimensionality increases. To alleviate this issue, we use a TDM for short-term planning, and learn an autoregressive discrete Q-function using a separate Q-Transformer (QT) model to estimate a long-term return beyond the short-horizon planning. Our proposed method, QT-TDM, integrates the robust predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Neural Networks and Applications · Manufacturing Process and Optimization
MethodsAttention Is All You Need · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections
