TL;DR
PRISM-SLAM is a real-time monocular SLAM system that integrates vision foundation model priors into a Bayesian framework to achieve scale-aware, metric-consistent localization and mapping in dynamic environments.
Contribution
It introduces a novel Plücker Ray-Distance Factor and a Dynamic Scene Uncertainty Gating mechanism to address scale ambiguity and environmental dynamics.
Findings
Achieves nearly oracle-aligned $SE(3)$ ATE on benchmarks.
Provides verified metric trajectories without post-hoc scale correction.
Operates at 30 FPS using only RGB input.
Abstract
Monocular SLAM historically suffers from scale ambiguity and tracking failure in dynamic environments. While recent vision foundation models (VFMs) provide remarkable zero-shot depth priors, naively integrating these deterministic predictions ignores predictive uncertainty and frame-to-frame scale inconsistencies. We propose PRISM-SLAM, a real-time framework that rigorously integrates VFM priors into a structured Bayesian factor graph to achieve scale-aware, metric-consistent localization and mapping. Specifically, we introduce a Pl\"ucker Ray-Distance Factor to anchor monocular observations in absolute space within a globally consistent metric coordinate system, mathematically resolving scale drift by making the metric scale Fisher-identifiable. To handle environmental dynamics, we derive an epistemic uncertainty proxy from temporal depth consistency and formulate a Dynamic Scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
