DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making
Zhuohui Zhang, Bin Cheng, Bin He

TL;DR
The paper introduces DLM, a unified language model framework for offline multi-agent decision making that leverages dialogue-style sequence prediction to improve generalization and robustness.
Contribution
It proposes a novel decision language model trained via supervised fine-tuning and policy optimization, enabling flexible, scalable multi-agent decision policies from offline data.
Findings
DLM outperforms strong offline MARL baselines on multiple benchmarks.
DLM demonstrates strong zero-shot generalization to unseen scenarios.
DLM effectively handles heterogeneous observations and actions in multi-agent settings.
Abstract
Building scalable and reusable multi-agent decision policies from offline datasets remains a challenge in offline multi-agent reinforcement learning (MARL), as existing methods often rely on fixed observation formats and action spaces that limit generalization. In contrast, large language models (LLMs) offer a flexible modeling interface that can naturally accommodate heterogeneous observations and actions. Motivated by this, we propose the Decision Language Model (DLM), which formulates multi-agent decision making as a dialogue-style sequence prediction problem under the centralized training with decentralized execution paradigm. DLM is trained in two stages: a supervised fine-tuning phase, which leverages dialogue-style datasets for centralized training with inter-agent context and generates executable actions from offline trajectories, followed by a group relative policy optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
