MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings
Shahil Shaik, Aditya Parameshwaran, Anshul Nayak, Jonathon M. Smereka, and Yue Wang

TL;DR
This paper introduces MA-VLCM, a framework that leverages pretrained vision-language models as critics in multi-agent reinforcement learning, enhancing sample efficiency and enabling deployment on resource-limited robots.
Contribution
The paper proposes replacing learned critics with pretrained vision-language models fine-tuned for multi-agent value estimation, improving efficiency and generalization.
Findings
Achieves good zero-shot return estimation in multi-agent scenarios.
Significantly improves sample efficiency in policy training.
Enables deployment on resource-constrained robotic systems.
Abstract
Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often lacks generalization across environments. At the same time, large vision-language-action models (VLAs) trained on internet-scale data exhibit strong multimodal reasoning and zero-shot generalization capabilities, yet directly deploying them for robotic execution remains computationally prohibitive, particularly in heterogeneous multi-robot systems with diverse embodiments and resource constraints. To address these challenges, we propose Multi-Agent Vision-Language-Critic Models (MA-VLCM), a framework that replaces the learned centralized critic in MARL with a pretrained vision-language model fine-tuned to evaluate multi-agent behavior. MA-VLCM acts as a centralized critic conditioned on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
