Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon   Robotic Manipulation

Yunhai Feng; Jiaming Han; Zhuoran Yang; Xiangyu Yue; Sergey Levine,; Jianlan Luo

arXiv:2502.16707·cs.RO·February 25, 2025

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine,, Jianlan Luo

PDF

Open Access

TL;DR

This paper introduces a reflection-based framework that enhances vision-language models' physical reasoning for complex, multi-stage robotic manipulation, significantly improving their performance over existing models and methods.

Contribution

It proposes a novel test-time reflection mechanism that iteratively refines VLMs' reasoning by imagining future states, leading to better handling of long-horizon manipulation tasks.

Findings

01

Outperforms state-of-the-art VLMs in manipulation tasks

02

Significantly improves reasoning over long horizons

03

Demonstrates effectiveness over other post-training methods like MCTS

Abstract

Solving complex long-horizon robotic manipulation problems requires sophisticated high-level planning capabilities, the ability to reason about the physical world, and reactively choose appropriate motor skills. Vision-language models (VLMs) pretrained on Internet data could in principle offer a framework for tackling such problems. However, in their current form, VLMs lack both the nuanced understanding of intricate physics required for robotic manipulation and the ability to reason over long horizons to address error compounding issues. In this paper, we introduce a novel test-time computation framework that enhances VLMs' physical reasoning capabilities for multi-stage manipulation tasks. At its core, our approach iteratively improves a pretrained VLM with a "reflection" mechanism - it uses a generative model to imagine future world states, leverages these predictions to guide action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms