FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Ruowen Zhao, Zhengyi Wang, Yikai Wang, Zihan Zhou, Jun Zhu

TL;DR
FlexiDreamer introduces a fast, end-to-end framework for high-quality 3D mesh reconstruction from multi-view images generated by diffusion models, overcoming limitations of implicit methods and post-processing artifacts.
Contribution
The paper presents FlexiCubes, a novel gradient-based mesh optimization technique enabling direct, efficient 3D mesh reconstruction from generated images in a single image-to-3D pipeline.
Findings
Generates high-fidelity 3D meshes in about 1 minute.
Outperforms previous methods in quality and speed.
Effectively reduces artifacts and surface noise.
Abstract
3D content generation has wide applications in various fields. One of its dominant paradigms is by sparse-view reconstruction using multi-view images generated by diffusion models. However, since directly reconstructing triangle meshes from multi-view images is challenging, most methodologies opt to an implicit representation (such as NeRF) during the sparse-view reconstruction and acquire the target mesh by a post-processing extraction. However, the implicit representation takes extensive time to train and the post-extraction also leads to undesirable visual artifacts. In this paper, we propose FlexiDreamer, a novel framework that directly reconstructs high-quality meshes from multi-view generated images. We utilize an advanced gradient-based mesh optimization, namely FlexiCubes, for multi-view mesh reconstruction, which enables us to generate 3D meshes in an end-to-end manner. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications
MethodsOPT · Diffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
