VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation

Wentao Zhao; Jiaming Chen; Ziyu Meng; Donghui Mao; Ran Song; Wei Zhang

arXiv:2407.09829·cs.RO·July 16, 2024·1 cites

VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation

Wentao Zhao, Jiaming Chen, Ziyu Meng, Donghui Mao, Ran Song, Wei Zhang

PDF

Open Access 1 Repo

TL;DR

VLMPC integrates vision-language models with model predictive control to enhance robotic manipulation by improving perception and decision-making, demonstrating superior performance on benchmarks and real-world tasks.

Contribution

This paper introduces VLMPC, a novel framework combining vision-language models with MPC for improved perception and control in robotic manipulation.

Findings

01

Outperforms state-of-the-art methods on public benchmarks

02

Shows excellent real-world robotic manipulation performance

03

Effectively integrates perception and planning via hierarchical cost functions

Abstract

Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage of the powerful perception capability of vision language model (VLM) and integrates it with MPC. Specifically, we propose a conditional action sampling module which takes as input a goal image or a language instruction and leverages VLM to sample a set of candidate action sequences. Then, a lightweight action-conditioned video prediction model is designed to generate a set of future frames conditioned on the candidate action sequences. VLMPC produces the optimal action sequence with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ppjmchen/vlmpc
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Robot Manipulation and Learning