STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner
Tianxing Zhou, Zhirui Wang, Haojia Ao, Guangyan Chen, Boyang Xing, Jingwen Cheng, Yi Yang, Yufeng Yue

TL;DR
The STEP framework enhances long-horizon robot task planning by constructing a hierarchical subgoal tree using LLMs and real-time feedback, significantly improving success rates over existing methods.
Contribution
This paper introduces a novel hierarchical subgoal tree construction method combining LLM-based decomposition and real-time termination feedback for embodied tasks.
Findings
Achieves up to 34% success in VirtualHome benchmark.
Attains 25% success rate on real robots.
Outperforms state-of-the-art methods in long-horizon tasks.
Abstract
The ability to perform reliable long-horizon task planning is crucial for deploying robots in real-world environments. However, directly employing Large Language Models (LLMs) as action sequence generators often results in low success rates due to their limited reasoning ability for long-horizon embodied tasks. In the STEP framework, we construct a subgoal tree through a pair of closed-loop models: a subgoal decomposition model and a leaf node termination model. Within this framework, we develop a hierarchical tree structure that spans from coarse to fine resolutions. The subgoal decomposition model leverages a foundation LLM to break down complex goals into manageable subgoals, thereby spanning the subgoal tree. The leaf node termination model provides real-time feedback based on environmental states, determining when to terminate the tree spanning and ensuring each leaf node can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games
