QT-TDM: Planning With Transformer Dynamics Model and Autoregressive   Q-Learning

Mostafa Kotb; Cornelius Weber; Muhammad Burhan Hafez; Stefan Wermter

arXiv:2407.18841·cs.LG·November 19, 2024·1 cites

QT-TDM: Planning With Transformer Dynamics Model and Autoregressive Q-Learning

Mostafa Kotb, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter

PDF

Open Access

TL;DR

This paper introduces QT-TDM, a novel reinforcement learning approach combining Transformer-based environment modeling with autoregressive Q-learning, enhancing long-term planning and efficiency in continuous control tasks.

Contribution

The paper proposes integrating Transformer Dynamics Models with a Q-Transformer for improved long-horizon planning and computational efficiency in RL.

Findings

01

QT-TDM outperforms existing Transformer-based RL models in performance.

02

It achieves higher sample efficiency in continuous control tasks.

03

The method enables fast, real-time inference with reduced computational costs.

Abstract

Inspired by the success of the Transformer architecture in natural language processing and computer vision, we investigate the use of Transformers in Reinforcement Learning (RL), specifically in modeling the environment's dynamics using Transformer Dynamics Models (TDMs). We evaluate the capabilities of TDMs for continuous control in real-time planning scenarios with Model Predictive Control (MPC). While Transformers excel in long-horizon prediction, their tokenization mechanism and autoregressive nature lead to costly planning over long horizons, especially as the environment's dimensionality increases. To alleviate this issue, we use a TDM for short-term planning, and learn an autoregressive discrete Q-function using a separate Q-Transformer (QT) model to estimate a long-term return beyond the short-horizon planning. Our proposed method, QT-TDM, integrates the robust predictive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Neural Networks and Applications · Manufacturing Process and Optimization

MethodsAttention Is All You Need · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections