LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping

Lijie Wang; Lianjie Guo; Ziyi Xu; Qianhao Wang; Fei Gao; Xieyuanli Chen

arXiv:2511.01186·cs.RO·November 4, 2025

LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping

Lijie Wang, Lianjie Guo, Ziyi Xu, Qianhao Wang, Fei Gao, Xieyuanli Chen

PDF

Open Access

TL;DR

LiDAR-VGGT introduces a two-stage fusion framework combining LiDAR odometry and VGGT to produce accurate, large-scale, colored 3D maps with improved global consistency and metric accuracy in robotics applications.

Contribution

The paper presents a novel coarse-to-fine fusion pipeline that tightly couples LiDAR inertial odometry with VGGT, addressing scalability and scale distortion issues in large environments.

Findings

01

Outperforms existing VGGT-based and LIVO methods in dense mapping accuracy.

02

Produces globally consistent, colored point clouds in large-scale environments.

03

Demonstrates robustness across multiple datasets.

Abstract

Reconstructing large-scale colored point clouds is an important task in robotics, supporting perception, navigation, and scene understanding. Despite advances in LiDAR inertial visual odometry (LIVO), its performance remains highly sensitive to extrinsic calibration. Meanwhile, 3D vision foundation models, such as VGGT, suffer from limited scalability in large environments and inherently lack metric scale. To overcome these limitations, we propose LiDAR-VGGT, a novel framework that tightly couples LiDAR inertial odometry with the state-of-the-art VGGT model through a two-stage coarse- to-fine fusion pipeline: First, a pre-fusion module with robust initialization refinement efficiently estimates VGGT poses and point clouds with coarse metric scale within each session. Then, a post-fusion module enhances cross-modal 3D similarity transformation, using bounding-box-based regularization to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · 3D Shape Modeling and Analysis · Advanced Vision and Imaging