Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

Yang Zhang; Chenjia Bai; Bin Zhao; Junchi Yan; Xiu Li; Xuelong Li

arXiv:2406.15836·cs.LG·September 3, 2025

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a novel decentralized world model for multi-agent reinforcement learning that combines local dynamics learning with centralized representation aggregation using Transformers, improving sample efficiency and performance.

Contribution

It presents the first Transformer-based multi-agent world model that effectively combines decentralized local dynamics with centralized aggregation, addressing scalability and non-stationarity issues.

Findings

01

Outperforms existing model-free approaches on SMAC

02

Achieves higher sample efficiency

03

Provides more accurate long-term predictions

Abstract

Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue in a centralized architecture arising from a large number of agents, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

1. In constructing the world model, the authors considered both centralized information and decentralized information. 2. The overall logic of the paper is coherent and easy to understand. 3. The paper conducted extensive experiments.

Weaknesses

1. The learning results of the world model depend on the supervisory signals, specifically the trajectories generated by a superior policy used as labels. In complex scenarios, without trajectories produced by an optimal policy, it may be difficult to learn a complete dynamic transition.

Reviewer 02Rating 6Confidence 4

Strengths

1. The integration of decentralized local dynamics learning and centralized feature aggregation is well-motivated and effectively addresses key challenges in MARL, such as scalability and non-stationarity. 2. The use of the Perceiver Transformer for centralized representation aggregation is an innovative contribution that facilitates efficient global information sharing between agents while maintaining scalability.

Weaknesses

1. **Necessity of individual components**: The authors claim that this work is “the first pioneering Transformer-based world model for multi-agent systems,” but the underlying techniques—centralized feature aggregation, the Perceiver Transformer, and autoregressive modeling of discretized tokens—are already present in the literature. More ablation experiments to demonstrate the necessity of these components would strengthen the paper. It is necessary to investigate whether it is a kind of simple

Reviewer 03Rating 6Confidence 2

Strengths

- The proposed model combined decentralized dynamics modeling with centralized representation aggregation using Transformer sequence modeling. - The paper is well-written and easy to follow. - The authors provide the ablation results and the analysis of attention patterns to reveal the implicit decison-making features.

Weaknesses

- The paper presentation could be improved with captioning figures of experimental results with short conclusions

Code & Models

Repositories

lucidrains/perceiver-pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Research in Systems and Signal Processing

MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings