LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping
Lijie Wang, Lianjie Guo, Ziyi Xu, Qianhao Wang, Fei Gao, Xieyuanli Chen

TL;DR
LiDAR-VGGT introduces a two-stage fusion framework combining LiDAR odometry and VGGT to produce accurate, large-scale, colored 3D maps with improved global consistency and metric accuracy in robotics applications.
Contribution
The paper presents a novel coarse-to-fine fusion pipeline that tightly couples LiDAR inertial odometry with VGGT, addressing scalability and scale distortion issues in large environments.
Findings
Outperforms existing VGGT-based and LIVO methods in dense mapping accuracy.
Produces globally consistent, colored point clouds in large-scale environments.
Demonstrates robustness across multiple datasets.
Abstract
Reconstructing large-scale colored point clouds is an important task in robotics, supporting perception, navigation, and scene understanding. Despite advances in LiDAR inertial visual odometry (LIVO), its performance remains highly sensitive to extrinsic calibration. Meanwhile, 3D vision foundation models, such as VGGT, suffer from limited scalability in large environments and inherently lack metric scale. To overcome these limitations, we propose LiDAR-VGGT, a novel framework that tightly couples LiDAR inertial odometry with the state-of-the-art VGGT model through a two-stage coarse- to-fine fusion pipeline: First, a pre-fusion module with robust initialization refinement efficiently estimates VGGT poses and point clouds with coarse metric scale within each session. Then, a post-fusion module enhances cross-modal 3D similarity transformation, using bounding-box-based regularization to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
