3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
Zutao Jiang, Guansong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun, Chang, Hang Xu

TL;DR
This paper introduces 3D-TOGO, a novel model for text-guided cross-category 3D object generation that produces textured 3D neural radiance fields without time-consuming per-case optimization.
Contribution
The paper presents the first generic approach combining text-to-views and views-to-3D modules for efficient, high-quality 3D object generation guided by captions across multiple categories.
Findings
Outperforms existing methods in PSNR, SSIM, LPIPS, and CLIP-score.
Generates textured 3D objects without per-case optimization.
Effective across 98 categories in the ABO dataset.
Abstract
Text-guided 3D object generation aims to generate 3D objects described by user-defined captions, which paves a flexible way to visualize what we imagined. Although some works have been devoted to solving this challenging task, these works either utilize some explicit 3D representations (e.g., mesh), which lack texture and require post-processing for rendering photo-realistic views; or require individual time-consuming optimization for every single case. Here, we make the first attempt to achieve generic text-guided cross-category 3D object generation via a new 3D-TOGO model, which integrates a text-to-views generation module and a views-to-3D generation module. The text-to-views generation module is designed to generate different views of the target 3D object given an input caption. prior-guidance, caption-guidance and view contrastive learning are proposed for achieving better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques
MethodsContrastive Learning
