WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models
R. Khorrambakht, Joaquim Ortiz-Haro, Joseph Amigo, Omar Mostafa, Daniel Dugas, Franziska Meier, and Ludovic Righetti

TL;DR
This paper introduces a model-based robotic planning approach combining visual world models, diffusion-based action sampling, and MCTS with MPC, demonstrating improved performance over behavior cloning in real-world tasks.
Contribution
It presents a novel integration of action-conditioned visual world models, diffusion-based action sampling, and Monte Carlo Tree Search for robotic planning and control.
Findings
Planning improves over behavior cloning in manipulation tasks.
Action sampling reduces hallucinations in world models during planning.
The approach is validated on three real-world robotic tasks.
Abstract
Robots must understand their environment from raw sensory inputs and reason about the consequences of their actions in it to solve complex tasks. Behavior Cloning (BC) leverages task-specific human demonstrations to learn this knowledge as end-to-end policies. However, these policies are difficult to transfer to new tasks, and generating training data is challenging because it requires careful demonstrations and frequent environment resets. In contrast to such policy-based view, in this paper we take a model-based approach where we collect a few hours of unstructured easy-to-collect play data to learn an action-conditioned visual world model, a diffusion-based action sampler, and optionally a reward model. The world model -- in combination with the action sampler and a reward model -- is then used to optimize long sequences of actions with a Monte Carlo Tree Search (MCTS) planner. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms
