Hierarchical Prompting with Dual LLM Modules for Robotic Task and Motion Planning
Karolina \'Zr\'obek, Tessa Pulli, Pawe{\l} Gajewski, Antonio Galiza Cerdeira Gonzalez, Bipin Indurkhya

TL;DR
This paper introduces a hierarchical, language-driven framework for robotic task and motion planning, combining high-level language understanding with low-level spatial reasoning to enhance human-robot interaction.
Contribution
It proposes a dual LLM module system with specialized sub-modules for spatial reasoning, improving natural language command execution in robotics.
Findings
Achieved 86% success rate across 24 diverse test scenarios.
Integrated YOLOX-GDRNet for object detection and pose estimation.
Demonstrated effective handling of complex spatial and high-level commands.
Abstract
We present a hierarchical language-driven framework for robotic task and motion planning to improve natural, intuitive human-robot interaction in service and assistance scenarios. The proposed system employs two large language model (LLM) modules: a high-level planning agent and a low-level spatial reasoning sub-module. The primary agent processes natural language commands and generates action sequences using a ReAct-style prompt, interacting with tools for object perception and manipulation (e.g., pick, place, release). For precise spatial placement, such as interpreting "place the mug next to the plate", a separate sub-prompting module handles 3D reasoning based on object geometry and scene layout. The system integrates YOLOX-GDRNet for object detection and pose estimation, along with a motion execution stub. We evaluated the system in 24 test scenarios, ranging from simple spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
