VGGT-SLAM++
Avilasha Mandal, Rajesh Kumar, Sudarshan Sunil Harithas, Chetan Arora

TL;DR
VGGT-SLAM++ is an advanced visual SLAM system that improves large-scale mapping accuracy and stability by integrating transformer-based visual geometry, dense DEMs, and frequent local optimizations.
Contribution
It introduces a novel SLAM pipeline combining VGGT outputs, dense DEMs, and a back-end for high-cadence local bundle adjustment, outperforming prior transformer-based SLAM methods.
Findings
Achieves state-of-the-art accuracy on standard SLAM benchmarks.
Reduces short-term pose drift significantly.
Maintains global consistency with efficient DEM tiles.
Abstract
We introduce VGGT-SLAM++, a complete visual SLAM system that leverages the geometry-rich outputs of the Visual Geometry Grounded Transformer (VGGT). The system comprises a visual odometry (front-end) fusing the VGGT feed-forward transformer and a Sim(3) solution, a Digital Elevation Map (DEM)-based graph construction module, and a back-end that jointly enable accurate large-scale mapping with bounded memory. While prior transformer-based SLAM pipelines such as VGGT-SLAM rely primarily on sparse loop closures or global Sim(3) manifold constraints - allowing short-horizon pose drift - VGGT-SLAM++ restores high-cadence local bundle adjustment (LBA) through a spatially corrective back-end. For each VGGT submap, we construct a dense planar-canonical DEM, partition it into patches, and compute their DINOv2 embeddings to integrate the submap into a covisibility graph. Spatial neighbors are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
