Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen,, Jun Wang, Yaodong Yang

TL;DR
This paper introduces Multi-Agent Transformer (MAT), a novel sequence modeling approach for cooperative multi-agent reinforcement learning that achieves superior performance, efficiency, and adaptability across various complex benchmarks.
Contribution
The paper presents MAT, a new architecture that reformulates MARL as a sequence modeling problem, enabling linear complexity and performance guarantees, trained via online interaction.
Findings
MAT outperforms baselines like MAPPO and HAPPO on multiple benchmarks.
MAT demonstrates high data efficiency and few-shot learning ability.
The approach provides a new perspective linking MARL with sequence modeling.
Abstract
Large sequence model (SM) such as GPT series and BERT has displayed outstanding performance and generalization capabilities on vision, language, and recently reinforcement learning tasks. A natural follow-up question is how to abstract multi-agent decision making into an SM problem and benefit from the prosperous development of SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · WordPiece · Position-Wise Feed-Forward Layer · Weight Decay · Byte Pair Encoding · Discriminative Fine-Tuning
