Closed-Loop Vision-Language Planning for Multi-Agent Coordination

Zhiyuan Li; Wenshuai Zhao; Joni Pajarinen

arXiv:2502.10148·cs.AI·May 6, 2026

Closed-Loop Vision-Language Planning for Multi-Agent Coordination

Zhiyuan Li, Wenshuai Zhao, Joni Pajarinen

PDF

1 Repo

TL;DR

COMPASS is a novel multi-agent framework that leverages Vision-Language Models for decentralized, interpretable decision-making, significantly improving performance on complex benchmarks like SMACv2.

Contribution

It introduces a VLM-based approach with structured communication and code-based strategies, addressing limitations of prior MARL methods.

Findings

01

Outperforms state-of-the-art MARL baselines on SMACv2.

02

Achieves a 57% win rate in Protoss 5v5, outperforming QMIX.

03

Demonstrates effective multi-hop communication for coordination.

Abstract

Cooperative multi-agent reinforcement learning (MARL) struggles with sample efficiency, interpretability, and generalization. While Large Language Models (LLMs) offer powerful planning capabilities, their application has been hampered by a reliance on text-only inputs and a failure to handle the non-Markovian, partially observable nature of multi-agent tasks. We introduce COMPASS, a multi-agent framework that overcomes these limitations by integrating Vision-Language Models (VLMs) for decentralized, closed-loop decision-making. COMPASS dynamically generates and refines interpretable, code-based strategies stored in a skill library that is bootstrapped from expert demonstrations. To ensure robust coordination, it propagates entity information through a structured multi-hop communication protocol, allowing teams to build a coherent understanding from partial observations. Evaluated on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://stellar-entremet-1720bb.netlify.app
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.