Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
Xianghui Yang, Huiwen Shi, Bowen Zhang, Fan Yang, Jiacheng Wang,, Hongxu Zhao, Xinhai Liu, Xinzhou Wang, Qingxiang Lin, Jiaao Yu, Lifu Wang,, Jing Xu, Zebin He, Zhuo Chen, Sicong Liu, Junta Wu, Yihang Lian, Shaoxiong, Yang, Yuhong Liu, Yong Yang, Di Wang, Jie Jiang, Chunchao Guo

TL;DR
Hunyuan3D 1.0 introduces a fast, two-stage unified framework for text- and image-conditioned 3D generation, significantly improving speed and quality over previous diffusion-based models.
Contribution
It presents a novel two-stage approach combining multi-view diffusion and rapid 3D reconstruction, supporting both text and image conditioning in a unified framework.
Findings
Generates multi-view images in ~4 seconds.
Reconstructs 3D assets in ~7 seconds.
Achieves a good balance between speed and quality.
Abstract
While 3D generative models have greatly improved artists' workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D 1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffusion model that efficiently generates multi-view RGB in approximately 4 seconds. These multi-view images capture rich details of the 3D asset from different viewpoints, relaxing the tasks from single-view to multi-view reconstruction. In the second stage, we introduce a feed-forward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds. The reconstruction network learns to handle noises and in-consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tencent/Hunyuan3D-2.1model· 31k dl· ♡ 88431k dl♡ 884
- 🤗tencent/HY3D-Benchmodel· 22 dl· ♡ 4322 dl♡ 43
- 🤗tencent/Hunyuan3D-2minimodel· 13k dl· ♡ 12713k dl♡ 127
- 🤗tencent/Hunyuan3D-Omnimodel· 696 dl· ♡ 163696 dl♡ 163
- 🤗tencent/Hunyuan3D-2model· 95k dl· ♡ 172495k dl♡ 1724
- 🤗tencent/Hunyuan3D-2mvmodel· 1.6k dl· ♡ 3331.6k dl♡ 333
- 🤗moeqbadar/Hunyuan3D-2.1model· ♡ 1♡ 1
- 🤗tencent/Hunyuan3D-1model· 1.1k dl· ♡ 3101.1k dl♡ 310
- 🤗camenduru/Hunyuan3D-1model· 1 dl1 dl
- 🤗jobs-git/Hunyuan3D-2model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
