TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design
Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng, Ge, Yuning Jiang

TL;DR
TextPainter is a multimodal model that generates visually and semantically harmonious text images for posters by leveraging background style cues and text comprehension, supported by a new large-scale dataset.
Contribution
It introduces a novel multimodal approach combining visual harmony and text understanding for poster text image generation, along with the PosterT80K dataset for future research.
Findings
TextPainter produces high-quality, harmonious text images.
The model outperforms existing methods in visual and semantic coherence.
The PosterT80K dataset enables further advancements in multimodal text generation.
Abstract
Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Furthermore, we leverage the language model and introduce a text comprehension module to achieve both sentence-level and word-level style variations. Besides, we construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents. We hope this dataset will pave the way for further research on multimodal text image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHierarchical Information Threading
