Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin, Zhu Xu, Yang Liu

TL;DR
This paper introduces Hierarchical-Chain-of-Generation, an automated method that improves text-to-3D generation of complex objects by decomposing descriptions and hierarchically generating object parts with attribute accuracy.
Contribution
It proposes a novel hierarchical generation framework using large language models and Gaussian kernels to better handle complex attributes and occlusions in text-to-3D synthesis.
Findings
Produces structurally coherent 3D objects with complex attributes
Improves attribute binding accuracy in generated 3D models
Handles occlusions effectively through hierarchical part generation
Abstract
Recent text-to-3D models can render high-quality assets, yet they still stumble on objects with complex attributes. The key obstacles are: (1) existing text-to-3D approaches typically lift text-to-image models to extract semantics via text encoders, while the text encoder exhibits limited comprehension ability for long descriptions, leading to deviated cross-attention focus, subsequently wrong attribute binding in generated results. (2) Occluded object parts demand a disciplined generation order and explicit part disentanglement. Though some works introduce manual efforts to alleviate the above issues, their quality is unstable and highly reliant on manual information. To tackle above problems, we propose a automated method Hierarchical-Chain-of-Generation (HCoG). It leverages a large language model to decompose the long description into blocks representing different object parts, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
