ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
Dmitry Petrov, Pradyumn Goyal, Divyansh Shivashok, Yuanming Tao, Melinos Averkiou, Evangelos Kalogerakis

TL;DR
ShapeWords is a novel method that integrates 3D shape information into text prompts to improve the accuracy and consistency of text-to-image synthesis, producing more shape-aware and text-compliant images.
Contribution
It introduces a new token embedding technique that incorporates 3D shape cues into text prompts, enhancing shape awareness in image generation.
Findings
Produces more shape-aware images with better 3D consistency.
Generates images that are more aligned with textual descriptions.
Maintains diversity and aesthetic quality in synthesized images.
Abstract
We introduce ShapeWords, an approach for synthesizing images based on 3D shape guidance and text prompts. ShapeWords incorporates target 3D shape information within specialized tokens embedded together with the input text, effectively blending 3D shape awareness with textual context to guide the image synthesis process. Unlike conventional shape guidance methods that rely on depth maps restricted to fixed viewpoints and often overlook full 3D structure or textual context, ShapeWords generates diverse yet consistent images that reflect both the target shape's geometry and the textual description. Experimental results show that ShapeWords produces images that are more text-compliant, aesthetically plausible, while also maintaining 3D shape awareness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Handwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
