GameVLM: A Decision-making Framework for Robotic Task Planning Based on   Visual Language Models and Zero-sum Games

Aoran Mei; Jianhua Wang; Guo-Niu Zhu; Zhongxue Gan

arXiv:2405.13751·cs.RO·May 24, 2024

GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games

Aoran Mei, Jianhua Wang, Guo-Niu Zhu, Zhongxue Gan

PDF

Open Access

TL;DR

GameVLM introduces a multi-agent framework leveraging visual-language models and zero-sum game theory to improve robotic task planning, addressing challenges like hallucination and semantic complexity, with promising experimental success.

Contribution

This work presents a novel multi-agent framework combining VLMs and zero-sum games for enhanced robotic task planning, a significant advancement over traditional methods.

Findings

01

Achieved an average success rate of 83.3% on real robots.

02

Effectively resolves agent inconsistencies using zero-sum game theory.

03

Demonstrates improved decision-making in complex robotic tasks.

Abstract

With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional task planning strategies, VLMs are strong in multimodal information parsing and code generation and show remarkable efficiency. Although VLMs demonstrate great potential in robotic task planning, they suffer from challenges like hallucination, semantic complexity, and limited context. To handle such issues, this paper proposes a multi-agent framework, i.e., GameVLM, to enhance the decision-making process in robotic task planning. In this study, VLM-based decision and expert agents are presented to conduct the task planning. Specifically, decision agents are used to plan the task, and the expert agent is employed to evaluate these task plans. Zero-sum game theory is introduced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · AI-based Problem Solving and Planning