Expressive Text-to-Image Generation with Rich Text
Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang

TL;DR
This paper introduces a rich-text interface for text-to-image generation, enabling detailed control over style, color, and regions, thus improving customization and fidelity in generated images.
Contribution
It proposes a novel region-based diffusion method that leverages rich text attributes for precise and detailed image synthesis, surpassing previous plain text approaches.
Findings
Outperforms baseline methods in quantitative evaluations.
Enables explicit control over style and color in generated images.
Demonstrates detailed region-specific image synthesis capabilities.
Abstract
Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
