TL;DR
AIM-SLAM introduces an adaptive multi-view keyframe prioritization method using foundation models for improved dense monocular SLAM accuracy and reconstruction, with state-of-the-art results.
Contribution
It proposes a novel SIGMA module for adaptive view selection and a joint multi-view optimization, advancing dense SLAM with foundation models.
Findings
Achieves state-of-the-art pose estimation performance.
Provides accurate dense reconstruction on real-world datasets.
Supports ROS integration for practical deployment.
Abstract
Recent advances in geometric foundation models have emerged as a promising alternative for addressing the challenge of dense reconstruction in monocular visual simultaneous localization and mapping (SLAM). Although geometric foundation models enable SLAM to leverage variable input views, the previous methods remain confined to two-view pairs or fixed-length inputs without sufficient deliberation of geometric context for view selection. To tackle this problem, we propose AIM-SLAM, a dense monocular SLAM framework that exploits an adaptive and informative multi-view keyframe prioritization with dense pointmap predictions from visual geometry grounded transformer (VGGT). Specifically, we introduce the selective information- and geometric-aware multi-view adaptation (SIGMA) module, which employs voxel overlap and information gain to retrieve a candidate set of keyframes and adaptively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
