DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making

Zhuohui Zhang; Bin Cheng; Bin He

arXiv:2604.23557·cs.MA·April 28, 2026

DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making

Zhuohui Zhang, Bin Cheng, Bin He

PDF

TL;DR

The paper introduces DLM, a unified language model framework for offline multi-agent decision making that leverages dialogue-style sequence prediction to improve generalization and robustness.

Contribution

It proposes a novel decision language model trained via supervised fine-tuning and policy optimization, enabling flexible, scalable multi-agent decision policies from offline data.

Findings

01

DLM outperforms strong offline MARL baselines on multiple benchmarks.

02

DLM demonstrates strong zero-shot generalization to unseen scenarios.

03

DLM effectively handles heterogeneous observations and actions in multi-agent settings.

Abstract

Building scalable and reusable multi-agent decision policies from offline datasets remains a challenge in offline multi-agent reinforcement learning (MARL), as existing methods often rely on fixed observation formats and action spaces that limit generalization. In contrast, large language models (LLMs) offer a flexible modeling interface that can naturally accommodate heterogeneous observations and actions. Motivated by this, we propose the Decision Language Model (DLM), which formulates multi-agent decision making as a dialogue-style sequence prediction problem under the centralized training with decentralized execution paradigm. DLM is trained in two stages: a supervised fine-tuning phase, which leverages dialogue-style datasets for centralized training with inter-agent context and generates executable actions from offline trajectories, followed by a group relative policy optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.