MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

Shahil Shaik; Aditya Parameshwaran; Anshul Nayak; Jonathon M. Smereka; and Yue Wang

arXiv:2603.15418·cs.RO·March 17, 2026

MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

Shahil Shaik, Aditya Parameshwaran, Anshul Nayak, Jonathon M. Smereka, and Yue Wang

PDF

Open Access

TL;DR

This paper introduces MA-VLCM, a framework that leverages pretrained vision-language models as critics in multi-agent reinforcement learning, enhancing sample efficiency and enabling deployment on resource-limited robots.

Contribution

The paper proposes replacing learned critics with pretrained vision-language models fine-tuned for multi-agent value estimation, improving efficiency and generalization.

Findings

01

Achieves good zero-shot return estimation in multi-agent scenarios.

02

Significantly improves sample efficiency in policy training.

03

Enables deployment on resource-constrained robotic systems.

Abstract

Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often lacks generalization across environments. At the same time, large vision-language-action models (VLAs) trained on internet-scale data exhibit strong multimodal reasoning and zero-shot generalization capabilities, yet directly deploying them for robotic execution remains computationally prohibitive, particularly in heterogeneous multi-robot systems with diverse embodiments and resource constraints. To address these challenges, we propose Multi-Agent Vision-Language-Critic Models (MA-VLCM), a framework that replaces the learned centralized critic in MARL with a pretrained vision-language model fine-tuned to evaluate multi-agent behavior. MA-VLCM acts as a centralized critic conditioned on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning