Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine,, Jianlan Luo

TL;DR
This paper introduces a reflection-based framework that enhances vision-language models' physical reasoning for complex, multi-stage robotic manipulation, significantly improving their performance over existing models and methods.
Contribution
It proposes a novel test-time reflection mechanism that iteratively refines VLMs' reasoning by imagining future states, leading to better handling of long-horizon manipulation tasks.
Findings
Outperforms state-of-the-art VLMs in manipulation tasks
Significantly improves reasoning over long horizons
Demonstrates effectiveness over other post-training methods like MCTS
Abstract
Solving complex long-horizon robotic manipulation problems requires sophisticated high-level planning capabilities, the ability to reason about the physical world, and reactively choose appropriate motor skills. Vision-language models (VLMs) pretrained on Internet data could in principle offer a framework for tackling such problems. However, in their current form, VLMs lack both the nuanced understanding of intricate physics required for robotic manipulation and the ability to reason over long horizons to address error compounding issues. In this paper, we introduce a novel test-time computation framework that enhances VLMs' physical reasoning capabilities for multi-stage manipulation tasks. At its core, our approach iteratively improves a pretrained VLM with a "reflection" mechanism - it uses a generative model to imagine future world states, leverages these predictions to guide action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms
