TL;DR
COMPASS is a novel multi-agent framework that leverages Vision-Language Models for decentralized, interpretable decision-making, significantly improving performance on complex benchmarks like SMACv2.
Contribution
It introduces a VLM-based approach with structured communication and code-based strategies, addressing limitations of prior MARL methods.
Findings
Outperforms state-of-the-art MARL baselines on SMACv2.
Achieves a 57% win rate in Protoss 5v5, outperforming QMIX.
Demonstrates effective multi-hop communication for coordination.
Abstract
Cooperative multi-agent reinforcement learning (MARL) struggles with sample efficiency, interpretability, and generalization. While Large Language Models (LLMs) offer powerful planning capabilities, their application has been hampered by a reliance on text-only inputs and a failure to handle the non-Markovian, partially observable nature of multi-agent tasks. We introduce COMPASS, a multi-agent framework that overcomes these limitations by integrating Vision-Language Models (VLMs) for decentralized, closed-loop decision-making. COMPASS dynamically generates and refines interpretable, code-based strategies stored in a skill library that is bootstrapped from expert demonstrations. To ensure robust coordination, it propagates entity information through a structured multi-hop communication protocol, allowing teams to build a coherent understanding from partial observations. Evaluated on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
