TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

TL;DR
TextDiffuser-2 leverages large language models within diffusion frameworks to improve text rendering by enabling automated layout planning, diverse style generation, and flexible modifications, validated through extensive experiments and user studies.
Contribution
The paper introduces a novel method that integrates large language models with diffusion models for automated, flexible, and diverse text rendering.
Findings
Achieves more rational text layouts and diverse styles
Outperforms previous methods in flexibility and automation
Validated by human and GPT-4V user studies
Abstract
The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of layout prediction, and restricted style diversity. In this paper, we present TextDiffuser-2, aiming to unleash the power of language models for text rendering. Firstly, we fine-tune a large language model for layout planning. The large language model is capable of automatically generating keywords for text rendering and also supports layout modification through chatting. Secondly, we utilize the language model within the diffusion model to encode the position and texts at the line level. Unlike…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
MethodsDiffusion
