MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Xuanyu Yi; Zike Wu; Qiuhong Shen; Qingshan Xu; Pan Zhou; Joo-Hwee Lim,; Shuicheng Yan; Xinchao Wang; Hanwang Zhang

arXiv:2406.06367·cs.CV·December 17, 2024

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim,, Shuicheng Yan, Xinchao Wang, Hanwang Zhang

PDF

Open Access 2 Repos

TL;DR

MVGamba introduces a lightweight, unified 3D content generation model using a state space sequence approach, improving multi-view consistency and detail in 3D reconstructions with lower computational costs.

Contribution

It proposes a novel Gaussian reconstruction model based on RNN-like State Space Models that enhances multi-view information propagation and integrates seamlessly with diffusion models.

Findings

01

Outperforms state-of-the-art in 3D generation tasks

02

Achieves high-quality 3D content with 0.1x model size

03

Improves multi-view consistency and detail in reconstructions

Abstract

Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Human Motion and Animation

MethodsDiffusion