Tracking and Planning with Spatial World Models
Baris Kayalibay, Atanas Mirchev, Patrick van der Smagt, Justin Bayer

TL;DR
This paper presents a real-time navigation and tracking method using differentiable 3D spatial world models, enabling vision-based control in complex environments with high success rates.
Contribution
It introduces a novel approach combining differentiable rendering with a pose estimation algorithm for model-based control in vision-based navigation.
Findings
Achieves up to 92% navigation success rate
Operates at 15 Hz in simulated environments
Uses only image and depth observations
Abstract
We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a learned 3D spatial world model, combined with a pose estimation algorithm previously used in the context of TSDF fusion, but now tailored to our setting and improved to incorporate agent dynamics. We evaluate over six simulated environments based on complex human-designed floor plans and provide quantitative results. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations under stochastic, continuous dynamics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization
