Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation

Zhixuan Shen; Jiawei Du; Ziyu Guo; Han Luo; Lilan Peng; Joey Tianyi Zhou; Haonan Luo; Tianrui Li

arXiv:2605.10118·cs.RO·May 12, 2026

Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation

Zhixuan Shen, Jiawei Du, Ziyu Guo, Han Luo, Lilan Peng, Joey Tianyi Zhou, Haonan Luo, Tianrui Li

PDF

TL;DR

This paper introduces SAGE, a physics-grounded abstraction framework for embodied navigation that enhances transferability and success rates in open-world environments by mimicking human mental simulation.

Contribution

SAGE enables agents to learn navigation policies within simplified physics abstractions, improving transfer to real-world robots and outperforming baseline success rates.

Findings

01

Achieved 53.21% success rate on A-EQA, 9.7% higher than baseline.

02

Demonstrated encouraging transfer to physical indoor robot deployment.

03

Proposed a novel asymmetric adaptive clipping mechanism for RL stability.

Abstract

Vision-Language Models (VLMs) have demonstrated exceptional general reasoning capabilities. However, their performance in embodied navigation remains hindered by a scarcity of aligned open-world vision and robot control data. Despite simulators providing a cost-effective alternative for data collection, the inherent reliance on photorealistic simulations often limits the transferability of learned policies. To this end, we propose \textit{\textbf{S}andbox-\textbf{A}bstracted \textbf{G}rounded \textbf{E}xperience} (\textbf{\textit{SAGE}}), a framework that enables agents to learn within a physics-grounded semantic abstraction rather than a photorealistic simulation, mimicking the human capacity for mental simulation where plans are rehearsed in simplified physics abstractions before execution. \textit{SAGE} system operates via three synergistic phases: (1) \textit{Genesis}: constructing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.