Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding
Pengkun Liu, Yikai Wang, Fuchun Sun, Jiafang Li, Hang Xiao, Hongxiang, Xue, Xinzhou Wang

TL;DR
Isotropic3D introduces a novel image-to-3D generation method that uses only a single CLIP embedding and a two-stage diffusion model fine-tuning, resulting in more consistent, symmetrical, and high-quality 3D outputs.
Contribution
The paper presents a new pipeline for image-to-3D generation based solely on CLIP embeddings, avoiding heavy reliance on reference images during optimization.
Findings
Produces multi-view consistent images and 3D models
Generates more symmetrical and well-proportioned 3D content
Achieves less distortion and preserves reference image similarity
Abstract
Encouraged by the growing availability of pre-trained 2D diffusion models, image-to-3D generation by leveraging Score Distillation Sampling (SDS) is making remarkable progress. Most existing methods combine novel-view lifting from 2D diffusion models which usually take the reference image as a condition while applying hard L2 image supervision at the reference view. Yet heavily adhering to the image is prone to corrupting the inductive knowledge of the 2D diffusion model leading to flat or distorted 3D generation frequently. In this work, we reexamine image-to-3D in a novel perspective and present Isotropic3D, an image-to-3D generation pipeline that takes only an image CLIP embedding as input. Isotropic3D allows the optimization to be isotropic w.r.t. the azimuth angle by solely resting on the SDS loss. The core of our framework lies in a two-stage diffusion model fine-tuning. Firstly,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Medical Image Segmentation Techniques
MethodsDiffusion · Contrastive Language-Image Pre-training
