MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys,, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

TL;DR
MVSplat is a fast, efficient model that predicts 3D Gaussian primitives from sparse multi-view images using a cost volume for accurate localization, achieving state-of-the-art results with fewer parameters.
Contribution
It introduces a novel cost volume-based approach for localizing 3D Gaussians in a feed-forward manner, significantly improving efficiency and accuracy over prior methods.
Findings
Achieves state-of-the-art performance on RealEstate10K and ACID benchmarks.
Runs at 22 fps with fewer parameters than previous methods.
Provides higher quality and better generalization across datasets.
Abstract
We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primitives' parameters jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussians via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, MVSplat achieves state-of-the-art performance with the fastest feed-forward inference speed (22~fps). More impressively, compared to the latest state-of-the-art method pixelSplat, MVSplat uses fewer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
