Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape
Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng

TL;DR
Sin3DM introduces a diffusion model trained on a single 3D textured shape, enabling high-quality 3D shape variation generation with detailed geometry and texture, while efficiently managing computational costs through latent space encoding.
Contribution
The paper presents Sin3DM, a novel method for learning a diffusion model from a single 3D shape using latent space encoding with triplane features, improving 3D shape generation quality.
Findings
Outperforms prior methods in shape generation quality
Enables applications like retargeting, outpainting, and editing
Efficient training via latent space encoding
Abstract
Synthesizing novel 3D models that resemble the input example has long been pursued by graphics artists and machine learning researchers. In this paper, we present Sin3DM, a diffusion model that learns the internal patch distribution from a single 3D textured shape and generates high-quality variations with fine geometry and texture details. Training a diffusion model directly in 3D would induce large memory and computational cost. Therefore, we first compress the input into a lower-dimensional latent space and then train a diffusion model on it. Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input. The denoising network of our diffusion model has a limited receptive field to avoid overfitting, and uses triplane-aware 2D convolution blocks to improve the result quality. Aside from randomly…
Peer Reviews
Decision·ICLR 2024 poster
The overall idea of using latent diffusion models to aid in learning the internal distribution of a 3D shape and the specific design choices in the method---e.g., the shape representation and the steps taken to prevent overfitting---are clever. The results shown also look nice and consistent with minimal artifacts.
It is mentioned several times that a unique benefit of the proposed method is that inputs and outputs meshes. This is not the case, as instead the 3D representation is triplanes encoding signed distance as well as texture color. While this can indeed be converted to a mesh using marching cubes, it's not fair to claim that the method outputs a mesh. This brings me to my main concern with the paper. While the authors cite and briefly discuss [Li et al. 2023], no qualitative or quantiative comparis
The related work section covers relevant topics and helps to set the work in context. The authors tackle a novel problem that I was not aware of before. The results look plausible and claims are supported.
One point that remains unclear to me is how the triplane based representations can handle larger extensions of objects, e.g. in the example of the building. How can the triplane represent those? Do you increase the size? The authors claim that the model learns a distribution over patches. It is unclear to me how you choose the size of patches/ receptive field and how you identify that it is nor overfitting. I think there is more evaluation needed to show that it is not overfitting. Maybe an ab
1. The edited geometry after altering the noise is satisfactory. It has rich local variation while retaining the global structure. 2. The writing is clear.
1. After training, the model could only generate 3D contents with the same global structure. Since we already have the basic 3D mesh at the beginning, it seems that the application is a bit narrow and limited. Besides, achieving this goal needs to train two models, an auto-encoder and a diffusion model. The outcome does not seem to match the effort. 2. The novelty of this work lies at the usage of tri-plane as the input of diffusion model and the introduction of tri-plane convolution. Both aspec
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion · Convolution
