Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model
Wenjiang Xu, Cindy Wang, Rui Fang, Mingkang Zhang, Lusong Li, Jing Xu, Jiayuan Gu, Zecui Zeng, and Rui Chen

TL;DR
The paper introduces Embodied Tree of Thoughts (EToT), a physics-grounded planning framework for robot manipulation that combines semantic reasoning and failure diagnosis to improve long-horizon task success.
Contribution
EToT is a novel framework that integrates a physics-based digital twin with tree search, enabling more accurate and physically consistent manipulation planning.
Findings
EToT outperforms baselines on manipulation tasks.
Effective failure diagnosis and correction improve planning robustness.
Physics grounding ensures adherence to physical constraints.
Abstract
World models have emerged as a pivotal component in robot manipulation planning, enabling agents to predict future environmental states and reason about the consequences of actions before execution. While video-generation models are increasingly adopted, they often lack rigorous physical grounding, leading to hallucinations and a failure to maintain consistency in long-horizon physical constraints. To address these limitations, we propose Embodied Tree of Thoughts (EToT), a novel Real2Sim2Real planning framework that leverages a physics-based interactive digital twin as an embodied world model. EToT formulates manipulation planning as a tree search expanded through two synergistic mechanisms: (1) Priori Branching, which generates diverse candidate execution paths based on semantic and spatial analysis; and (2) Reflective Branching, which utilizes VLMs to diagnose execution failures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Reinforcement Learning in Robotics
