S-MUSt3R: Sliding Multi-view 3D Reconstruction
Leonid Antsfeld, Boris Chidlovskii, Yohann Cabon, Vincent Leroy, Jerome Revaud

TL;DR
S-MUSt3R introduces a scalable pipeline for monocular 3D reconstruction using foundation models, enabling long sequence processing and metric space predictions without retraining.
Contribution
It extends foundation models for large-scale 3D reconstruction through sequence segmentation, alignment, and lightweight optimization, without requiring model retraining.
Findings
Achieves comparable performance to traditional methods
Successfully processes long RGB sequences
Produces accurate and consistent 3D reconstructions
Abstract
The recent paradigm shift in 3D vision led to the rise of foundation models with remarkable capabilities in 3D perception from uncalibrated images. However, extending these models to large-scale RGB stream 3D reconstruction remains challenging due to memory limitations. This work proposes S-MUSt3R, a simple and efficient pipeline that extends the limits of foundation models for monocular 3D reconstruction. Our approach addresses the scalability bottleneck of foundation models through a simple strategy of sequence segmentation followed by segment alignment and lightweight loop closure optimization. Without model retraining, we benefit from remarkable 3D reconstruction capacities of MUSt3R model and achieve trajectory and reconstruction performance comparable to traditional methods with more complex architecture. We evaluate S-MUSt3R on TUM, 7-Scenes and proprietary robot navigation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
