SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing and, Liangyan Gui

TL;DR
SDFusion is a versatile framework that enables multimodal 3D shape completion, reconstruction, and generation, allowing users to interactively generate and modify 3D assets using images, text, and partial shapes.
Contribution
It introduces a flexible, multi-modal diffusion-based model that unifies various 3D shape tasks into a single system with adjustable input influence.
Findings
Outperforms prior methods on shape completion, image-based 3D reconstruction, and text-to-3D tasks.
Supports combined multi-modal inputs for interactive shape generation.
Provides an efficient way to texture generated shapes using large-scale text-to-image models.
Abstract
In this work, we present a novel framework built to simplify 3D asset generation for amateur users. To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input. At the core of our approach is an encoder-decoder, compressing 3D shapes into a compact latent representation, upon which a diffusion model is learned. To enable a variety of multi-modal inputs, we employ task-specific encoders with dropout followed by a cross-attention mechanism. Due to its flexibility, our model naturally supports a variety of tasks, outperforming prior works on shape completion, image-based 3D reconstruction, and text-to-3D. Most interestingly, our model can combine all these tasks into one swiss-army-knife tool,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Human Pose and Action Recognition · 3D Shape Modeling and Analysis
MethodsDiffusion · Dropout
