Seeing Farther and Smarter: Value-Guided Multi-Path Reflection for VLM Policy Optimization

Yanting Yang; Shenyuan Gao; Qingwen Bu; Li Chen; Dimitris N.Metaxas

arXiv:2602.19372·cs.RO·February 24, 2026

Seeing Farther and Smarter: Value-Guided Multi-Path Reflection for VLM Policy Optimization

Yanting Yang, Shenyuan Gao, Qingwen Bu, Li Chen, Dimitris N.Metaxas

PDF

Open Access

TL;DR

This paper introduces a value-guided multi-path reflection framework for vision-language model-based robotic manipulation, improving success rates and efficiency by explicitly modeling action advantages, exploring multiple future paths, and employing early exit strategies.

Contribution

It proposes a novel test-time computation framework that decouples state evaluation from action generation, enabling more robust and efficient decision-making in robotic manipulation tasks.

Findings

01

Achieved a 24.6% success rate improvement over baselines.

02

Reduced inference time by 56.5%.

03

Enhanced robustness through multi-path exploration and confidence-based early exit.

Abstract

Solving complex, long-horizon robotic manipulation tasks requires a deep understanding of physical interactions, reasoning about their long-term consequences, and precise high-level planning. Vision-Language Models (VLMs) offer a general perceive-reason-act framework for this goal. However, previous approaches using reflective planning to guide VLMs in correcting actions encounter significant limitations. These methods rely on inefficient and often inaccurate implicit learning of state-values from noisy foresight predictions, evaluate only a single greedy future, and suffer from substantial inference latency. To address these limitations, we propose a novel test-time computation framework that decouples state evaluation from action generation. This provides a more direct and fine-grained supervisory signal for robust decision-making. Our method explicitly models the advantage of an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning