VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin

TL;DR
This paper introduces VIKI-R, a reinforcement learning framework for embodied multi-agent cooperation, evaluated on the new VIKI-Bench benchmark, demonstrating improved coordination and visual reasoning across diverse robot types.
Contribution
The work presents VIKI-Bench, a hierarchical benchmark for embodied multi-agent cooperation, and VIKI-R, a novel RL-based method fine-tuning vision-language models for better multi-agent coordination.
Findings
VIKI-R outperforms baseline methods across all task levels.
Reinforcement learning fosters emergent compositional cooperation among heterogeneous agents.
VIKI-Bench provides a comprehensive platform for evaluating embodied multi-agent visual reasoning.
Abstract
Coordinating multiple embodied agents in dynamic environments remains a core challenge in artificial intelligence, requiring both perception-driven reasoning and scalable cooperation strategies. While recent works have leveraged large language models (LLMs) for multi-agent planning, a few have begun to explore vision-language models (VLMs) for visual reasoning. However, these VLM-based approaches remain limited in their support for diverse embodiment types. In this work, we introduce VIKI-Bench, the first hierarchical benchmark tailored for embodied multi-agent cooperation, featuring three structured levels: agent activation, task planning, and trajectory perception. VIKI-Bench includes diverse robot embodiments, multi-view visual observations, and structured supervision signals to evaluate reasoning grounded in visual inputs. To demonstrate the utility of VIKI-Bench, we propose VIKI-R,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Reinforcement Learning in Robotics
