VGGT-SLAM 2.0: Real-time Dense Feed-forward Scene Reconstruction
Dominic Maggio, Luca Carlone

TL;DR
VGGT-SLAM 2.0 is a real-time dense scene reconstruction system that enhances previous SLAM methods by removing drift, improving loop closure, and enabling open-set object detection, demonstrated on diverse environments and achieving higher accuracy.
Contribution
It introduces a new factor graph design to reduce drift, leverages attention layers for image verification without extra training, and adapts for open-set object detection in real-time.
Findings
Achieves 23% less pose error than VGGT-SLAM on TUM dataset.
Operates in real-time onboard a ground robot.
Successfully applied to diverse environments including indoor and barn scenes.
Abstract
We present VGGT-SLAM 2.0, a real-time RGB feed-forward SLAM system which substantially improves upon VGGT-SLAM for incrementally aligning submaps created from VGGT. Firstly, we remove high-dimensional 15-degree-of-freedom drift and planar degeneracy from VGGT-SLAM by creating a new factor graph design while still addressing the reconstruction ambiguity of VGGT given unknown camera intrinsics. Secondly, by studying the attention layers of VGGT, we show that one of the layers is well suited to assist in image retrieval verification for free without additional training, which enables both rejecting false positive matches and allows for completing more loop closures. Finally, we conduct a suite of experiments which includes showing VGGT-SLAM 2.0 can easily be adapted for open-set object detection and demonstrating real-time performance while running online onboard a ground robot using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
