Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

Rundi Wu; Ruoshi Liu; Carl Vondrick; Changxi Zheng

arXiv:2305.15399·cs.CV·February 22, 2024·6 cites

Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng

PDF

Open Access 1 Repo 3 Reviews

TL;DR

Sin3DM introduces a diffusion model trained on a single 3D textured shape, enabling high-quality 3D shape variation generation with detailed geometry and texture, while efficiently managing computational costs through latent space encoding.

Contribution

The paper presents Sin3DM, a novel method for learning a diffusion model from a single 3D shape using latent space encoding with triplane features, improving 3D shape generation quality.

Findings

01

Outperforms prior methods in shape generation quality

02

Enables applications like retargeting, outpainting, and editing

03

Efficient training via latent space encoding

Abstract

Synthesizing novel 3D models that resemble the input example has long been pursued by graphics artists and machine learning researchers. In this paper, we present Sin3DM, a diffusion model that learns the internal patch distribution from a single 3D textured shape and generates high-quality variations with fine geometry and texture details. Training a diffusion model directly in 3D would induce large memory and computational cost. Therefore, we first compress the input into a lower-dimensional latent space and then train a diffusion model on it. Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input. The denoising network of our diffusion model has a limited receptive field to avoid overfitting, and uses triplane-aware 2D convolution blocks to improve the result quality. Aside from randomly…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The overall idea of using latent diffusion models to aid in learning the internal distribution of a 3D shape and the specific design choices in the method---e.g., the shape representation and the steps taken to prevent overfitting---are clever. The results shown also look nice and consistent with minimal artifacts.

Weaknesses

It is mentioned several times that a unique benefit of the proposed method is that inputs and outputs meshes. This is not the case, as instead the 3D representation is triplanes encoding signed distance as well as texture color. While this can indeed be converted to a mesh using marching cubes, it's not fair to claim that the method outputs a mesh. This brings me to my main concern with the paper. While the authors cite and briefly discuss [Li et al. 2023], no qualitative or quantiative comparis

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The related work section covers relevant topics and helps to set the work in context. The authors tackle a novel problem that I was not aware of before. The results look plausible and claims are supported.

Weaknesses

One point that remains unclear to me is how the triplane based representations can handle larger extensions of objects, e.g. in the example of the building. How can the triplane represent those? Do you increase the size? The authors claim that the model learns a distribution over patches. It is unclear to me how you choose the size of patches/ receptive field and how you identify that it is nor overfitting. I think there is more evaluation needed to show that it is not overfitting. Maybe an ab

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The edited geometry after altering the noise is satisfactory. It has rich local variation while retaining the global structure. 2. The writing is clear.

Weaknesses

1. After training, the model could only generate 3D contents with the same global structure. Since we already have the basic 3D mesh at the beginning, it seems that the application is a bit narrow and limited. Besides, achieving this goal needs to train two models, an auto-encoder and a diffusion model. The outcome does not seem to match the effort. 2. The novelty of this work lies at the usage of tri-plane as the input of diffusion model and the introduction of tri-plane convolution. Both aspec

Code & Models

Repositories

sin3dm/sin3dm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Convolution