MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen; Haofei Xu; Chuanxia Zheng; Bohan Zhuang; Marc Pollefeys,; Andreas Geiger; Tat-Jen Cham; Jianfei Cai

arXiv:2403.14627·cs.CV·October 29, 2024·5 cites

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys,, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

PDF

Open Access 1 Repo 1 Models

TL;DR

MVSplat is a fast, efficient model that predicts 3D Gaussian primitives from sparse multi-view images using a cost volume for accurate localization, achieving state-of-the-art results with fewer parameters.

Contribution

It introduces a novel cost volume-based approach for localizing 3D Gaussians in a feed-forward manner, significantly improving efficiency and accuracy over prior methods.

Findings

01

Achieves state-of-the-art performance on RealEstate10K and ACID benchmarks.

02

Runs at 22 fps with fewer parameters than previous methods.

03

Provides higher quality and better generalization across datasets.

Abstract

We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primitives' parameters jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussians via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, MVSplat achieves state-of-the-art performance with the fastest feed-forward inference speed (22~fps). More impressively, compared to the latest state-of-the-art method pixelSplat, MVSplat uses $10 \times$ fewer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

donydchen/mvsplat
jaxOfficial

Models

🤗
dylanebert/mvsplat
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings