VGGT-SLAM++

Avilasha Mandal; Rajesh Kumar; Sudarshan Sunil Harithas; Chetan Arora

arXiv:2604.06830·cs.CV·April 9, 2026

VGGT-SLAM++

Avilasha Mandal, Rajesh Kumar, Sudarshan Sunil Harithas, Chetan Arora

PDF

TL;DR

VGGT-SLAM++ is an advanced visual SLAM system that improves large-scale mapping accuracy and stability by integrating transformer-based visual geometry, dense DEMs, and frequent local optimizations.

Contribution

It introduces a novel SLAM pipeline combining VGGT outputs, dense DEMs, and a back-end for high-cadence local bundle adjustment, outperforming prior transformer-based SLAM methods.

Findings

01

Achieves state-of-the-art accuracy on standard SLAM benchmarks.

02

Reduces short-term pose drift significantly.

03

Maintains global consistency with efficient DEM tiles.

Abstract

We introduce VGGT-SLAM++, a complete visual SLAM system that leverages the geometry-rich outputs of the Visual Geometry Grounded Transformer (VGGT). The system comprises a visual odometry (front-end) fusing the VGGT feed-forward transformer and a Sim(3) solution, a Digital Elevation Map (DEM)-based graph construction module, and a back-end that jointly enable accurate large-scale mapping with bounded memory. While prior transformer-based SLAM pipelines such as VGGT-SLAM rely primarily on sparse loop closures or global Sim(3) manifold constraints - allowing short-horizon pose drift - VGGT-SLAM++ restores high-cadence local bundle adjustment (LBA) through a spatially corrective back-end. For each VGGT submap, we construct a dense planar-canonical DEM, partition it into patches, and compute their DINOv2 embeddings to integrate the submap into a covisibility graph. Spatial neighbors are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.