Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models
Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li

TL;DR
This paper introduces a novel decentralized world model for multi-agent reinforcement learning that combines local dynamics learning with centralized representation aggregation using Transformers, improving sample efficiency and performance.
Contribution
It presents the first Transformer-based multi-agent world model that effectively combines decentralized local dynamics with centralized aggregation, addressing scalability and non-stationarity issues.
Findings
Outperforms existing model-free approaches on SMAC
Achieves higher sample efficiency
Provides more accurate long-term predictions
Abstract
Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue in a centralized architecture arising from a large number of agents, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide…
Peer Reviews
Decision·Submitted to ICLR 2025
1. In constructing the world model, the authors considered both centralized information and decentralized information. 2. The overall logic of the paper is coherent and easy to understand. 3. The paper conducted extensive experiments.
1. The learning results of the world model depend on the supervisory signals, specifically the trajectories generated by a superior policy used as labels. In complex scenarios, without trajectories produced by an optimal policy, it may be difficult to learn a complete dynamic transition.
1. The integration of decentralized local dynamics learning and centralized feature aggregation is well-motivated and effectively addresses key challenges in MARL, such as scalability and non-stationarity. 2. The use of the Perceiver Transformer for centralized representation aggregation is an innovative contribution that facilitates efficient global information sharing between agents while maintaining scalability.
1. **Necessity of individual components**: The authors claim that this work is “the first pioneering Transformer-based world model for multi-agent systems,” but the underlying techniques—centralized feature aggregation, the Perceiver Transformer, and autoregressive modeling of discretized tokens—are already present in the literature. More ablation experiments to demonstrate the necessity of these components would strengthen the paper. It is necessary to investigate whether it is a kind of simple
- The proposed model combined decentralized dynamics modeling with centralized representation aggregation using Transformer sequence modeling. - The paper is well-written and easy to follow. - The authors provide the ablation results and the analysis of attention patterns to reveal the implicit decison-making features.
- The paper presentation could be improved with captioning figures of experimental results with short conclusions
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Systems and Signal Processing
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings
