FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM
Yuchen Wu, Jiahe Li, Fabio Tosi, Matteo Poggi, Jin Zheng, Xiao Bai

TL;DR
FoundationSLAM introduces a novel deep learning-based monocular dense SLAM system that integrates foundation depth models with geometric reasoning, achieving accurate, robust, and real-time dense mapping and tracking.
Contribution
The paper presents a hybrid flow network, a bi-consistent bundle adjustment layer, and a reliability-aware refinement mechanism, advancing the integration of deep learning and geometric SLAM.
Findings
Achieves superior trajectory accuracy on multiple datasets.
Provides dense reconstruction quality comparable to state-of-the-art methods.
Operates in real-time at 18 FPS with strong generalization.
Abstract
We present FoundationSLAM, a learning-based monocular dense SLAM system that addresses the absence of geometric consistency in previous flow-based approaches for accurate and robust tracking and mapping. Our core idea is to bridge flow estimation with geometric reasoning by leveraging the guidance from foundation depth models. To this end, we first develop a Hybrid Flow Network that produces geometry-aware correspondences, enabling consistent depth and pose inference across diverse keyframes. To enforce global consistency, we propose a Bi-Consistent Bundle Adjustment Layer that jointly optimizes keyframe pose and depth under multi-view constraints. Furthermore, we introduce a Reliability-Aware Refinement mechanism that dynamically adapts the flow update process by distinguishing between reliable and uncertain regions, forming a closed feedback loop between matching and optimization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Robotic Path Planning Algorithms
