Vision-Language-Policy Model for Dynamic Robot Task Planning

Jin Wang; Kim Tien Ly; Jacques Cloete; Nikos Tsagarakis; Ioannis Havoutis

arXiv:2512.19178·cs.RO·December 23, 2025

Vision-Language-Policy Model for Dynamic Robot Task Planning

Jin Wang, Kim Tien Ly, Jacques Cloete, Nikos Tsagarakis, Ioannis Havoutis

PDF

Open Access

TL;DR

This paper introduces a vision-language-policy model that enables robots to interpret natural language instructions, reason over their environment, and adapt their task strategies dynamically, improving flexibility and generalization in real-world scenarios.

Contribution

It presents a novel vision-language model fine-tuned on real-world data that allows robots to interpret instructions, reason about scenes, and adapt their behavior policies dynamically during task execution.

Findings

01

Model effectively interprets semantic instructions.

02

Enables dynamic adjustment of task strategies.

03

Demonstrates strong generalization across different robots and tasks.

Abstract

Bridging the gap between natural language commands and autonomous execution in unstructured environments remains an open challenge for robotics. This requires robots to perceive and reason over the current task scene through multiple modalities, and to plan their behaviors to achieve their intended goals. Traditional robotic task-planning approaches often struggle to bridge low-level execution with high-level task reasoning, and cannot dynamically update task strategies when instructions change during execution, which ultimately limits their versatility and adaptability to new tasks. In this work, we propose a novel language model-based framework for dynamic robot task planning. Our Vision-Language-Policy (VLP) model, based on a vision-language model fine-tuned on real-world data, can interpret semantic instructions and integrate reasoning over the current task scene to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics