M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM

Kerui Ren; Guanghao Li; Changjian Jiang; Yingxiang Xu; Tao Lu; Linning Xu; Junting Dong; Jiangmiao Pang; Mulin Yu; Bo Dai

arXiv:2603.16844·cs.CV·March 18, 2026

M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM

Kerui Ren, Guanghao Li, Changjian Jiang, Yingxiang Xu, Tao Lu, Linning Xu, Junting Dong, Jiangmiao Pang, Mulin Yu, Bo Dai

PDF

Open Access

TL;DR

M^3 introduces a dense matching approach within a multi-view foundation model to significantly improve monocular SLAM accuracy and robustness in dynamic environments, achieving state-of-the-art results.

Contribution

It develops a novel M^3 framework that combines dense correspondence estimation with monocular SLAM, enhancing pose accuracy and scene reconstruction quality.

Findings

01

Reduces ATE RMSE by 64.3% compared to VGGT-SLAM 2.0

02

Outperforms ARTDECO by 2.11 dB in PSNR on ScanNet++

03

Achieves state-of-the-art accuracy in diverse benchmarks

Abstract

Streaming reconstruction from uncalibrated monocular video remains challenging, as it requires both high-precision pose estimation and computationally efficient online refinement in dynamic environments. While coupling 3D foundation models with SLAM frameworks is a promising paradigm, a critical bottleneck persists: most multi-view foundation models estimate poses in a feed-forward manner, yielding pixel-level correspondences that lack the requisite precision for rigorous geometric optimization. To address this, we present M^3, which augments the Multi-view foundation model with a dedicated Matching head to facilitate fine-grained dense correspondences and integrates it into a robust Monocular Gaussian Splatting SLAM. M^3 further enhances tracking stability by incorporating dynamic area suppression and cross-inference intrinsic alignment. Extensive experiments on diverse indoor and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques