3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

Bronislav Sidik; Dror Mizrahi

arXiv:2604.11302·cs.RO·April 14, 2026

3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

Bronislav Sidik, Dror Mizrahi

PDF

TL;DR

The paper introduces 3D-ALP, a planning method combining MCTS with a persistent 3D world model, enabling robots to maintain spatial memory and improve manipulation success over reactive policies.

Contribution

It presents a novel 3D-Anchored Lookahead Planning approach that enhances robotic manipulation by integrating persistent spatial memory with MCTS, addressing limitations of reactive policies.

Findings

01

3D-ALP achieves significantly higher success rates on memory-dependent tasks.

02

Maintaining a persistent camera-to-world anchor improves planning accuracy.

03

Structural issues in applying UCT-MCTS to robotics are identified and resolved.

Abstract

We present 3D-Anchored Lookahead Planning (3D-ALP), a System 2 reasoning engine for robotic manipulation that combines Monte Carlo Tree Search (MCTS) with a 3D-consistent world model as the rollout oracle. Unlike reactive policies that evaluate actions from the current camera frame only, 3D-ALP maintains a persistent camera-to-world (c2w) anchor that survives occlusion, enabling accurate replanning to object positions that are no longer directly observable. On a 5-step sequential reach task requiring spatial memory (Experiment E3), 3D-ALP achieves 0.650 0.109 success rate on memory-required steps versus 0.006 0.008 for a greedy reactive baseline ({\Delta}=+0.645), while step 5 success reaches 0.822 against 0.000 for greedy. An ablation study (30 episodes, 3 seeds) isolates tree search spatial memory as the primary driver (+0.533, 82% of gain) with additional benefit from deeper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.