HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
Zhiying Leng, Tolga Birdal, Xiaohui Liang, Federico Tombari

TL;DR
HyperSDFusion introduces a hyperbolic space-based dual-branch diffusion model that effectively captures hierarchical structures in text and 3D shapes, significantly improving text-to-shape generation quality.
Contribution
It is the first to utilize hyperbolic hierarchical representations for text-to-shape generation, integrating hyperbolic encoders and hierarchical loss to enhance structural understanding.
Findings
Achieved state-of-the-art results on Text2Shape dataset.
Effectively models hierarchical structures in text and 3D shapes.
Demonstrates the superiority of hyperbolic space in 3D shape generation.
Abstract
3D shape generation from text is a fundamental task in 3D representation learning. The text-shape pairs exhibit a hierarchical structure, where a general text like ``chair" covers all 3D shapes of the chair, while more detailed prompts refer to more specific shapes. Furthermore, both text and 3D shapes are inherently hierarchical structures. However, existing Text2Shape methods, such as SDFusion, do not exploit that. In this work, we propose HyperSDFusion, a dual-branch diffusion model that generates 3D shapes from a given text. Since hyperbolic space is suitable for handling hierarchical data, we propose to learn the hierarchical representations of text and 3D shapes in hyperbolic space. First, we introduce a hyperbolic text-image encoder to learn the sequential and multi-modal hierarchical features of text in hyperbolic space. In addition, we design a hyperbolic text-graph convolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Human Motion and Animation · Handwritten Text Recognition Techniques
MethodsDiffusion · Convolution
