S-MUSt3R: Sliding Multi-view 3D Reconstruction

Leonid Antsfeld; Boris Chidlovskii; Yohann Cabon; Vincent Leroy; Jerome Revaud

arXiv:2602.04517·cs.CV·February 5, 2026

S-MUSt3R: Sliding Multi-view 3D Reconstruction

Leonid Antsfeld, Boris Chidlovskii, Yohann Cabon, Vincent Leroy, Jerome Revaud

PDF

Open Access

TL;DR

S-MUSt3R introduces a scalable pipeline for monocular 3D reconstruction using foundation models, enabling long sequence processing and metric space predictions without retraining.

Contribution

It extends foundation models for large-scale 3D reconstruction through sequence segmentation, alignment, and lightweight optimization, without requiring model retraining.

Findings

01

Achieves comparable performance to traditional methods

02

Successfully processes long RGB sequences

03

Produces accurate and consistent 3D reconstructions

Abstract

The recent paradigm shift in 3D vision led to the rise of foundation models with remarkable capabilities in 3D perception from uncalibrated images. However, extending these models to large-scale RGB stream 3D reconstruction remains challenging due to memory limitations. This work proposes S-MUSt3R, a simple and efficient pipeline that extends the limits of foundation models for monocular 3D reconstruction. Our approach addresses the scalability bottleneck of foundation models through a simple strategy of sequence segmentation followed by segment alignment and lightweight loop closure optimization. Without model retraining, we benefit from remarkable 3D reconstruction capacities of MUSt3R model and achieve trajectory and reconstruction performance comparable to traditional methods with more complex architecture. We evaluate S-MUSt3R on TUM, 7-Scenes and proprietary robot navigation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · 3D Shape Modeling and Analysis