TL;DR
VGGT-SLAM introduces a dense RGB SLAM system that optimizes scene reconstruction on the SL(4) manifold, effectively handling uncalibrated monocular cameras and improving map quality over long sequences.
Contribution
It proposes a novel SLAM optimization on the SL(4) manifold to recover consistent scene reconstructions from uncalibrated cameras, addressing reconstruction ambiguity.
Findings
Achieves improved map quality on long video sequences.
Handles uncalibrated monocular camera data effectively.
Outperforms previous methods in reconstruction consistency.
Abstract
We present VGGT-SLAM, a dense RGB SLAM system constructed by incrementally and globally aligning submaps created from the feed-forward scene reconstruction approach VGGT using only uncalibrated monocular cameras. While related works align submaps using similarity transforms (i.e., translation, rotation, and scale), we show that such approaches are inadequate in the case of uncalibrated cameras. In particular, we revisit the idea of reconstruction ambiguity, where given a set of uncalibrated cameras with no assumption on the camera motion or scene structure, the scene can only be reconstructed up to a 15-degrees-of-freedom projective transformation of the true geometry. This inspires us to recover a consistent scene reconstruction across submaps by optimizing over the SL(4) manifold, thus estimating 15-degrees-of-freedom homography transforms between sequential submaps while accounting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsSparse Evolutionary Training · ALIGN
