VLM-driven Behavior Tree for Context-aware Task Planning
Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Kazuhiro Sasabuchi,, Katsushi Ikeuchi

TL;DR
This paper introduces a novel framework that uses Vision-Language Models to generate and edit Behavior Trees for robots, enabling context-aware operations in complex visual environments, validated in a real-world cafe setting.
Contribution
It presents a new VLM-driven approach for interactive, visual condition-based Behavior Tree generation and editing for robot task planning.
Findings
Successfully demonstrated in a real-world cafe scenario
Enabled context-aware robot operations based on visual conditions
Showed feasibility and identified limitations of the approach
Abstract
The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages Vision-Language Models (VLMs) to interactively generate and edit BTs that address visual conditions, enabling context-aware robot operations in visually complex environments. A key feature of our approach lies in the conditional control through self-prompted visual conditions. Specifically, the VLM generates BTs with visual condition nodes, where conditions are expressed as free-form text. Another VLM process integrates the text into its prompt and evaluates the conditions against real-world images during robot execution. We validated our framework in a real-world cafe scenario, demonstrating both its feasibility and limitations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · Anomaly Detection Techniques and Applications
MethodsSoftmax · Attention Is All You Need
