TextDiffuser-2: Unleashing the Power of Language Models for Text   Rendering

Jingye Chen; Yupan Huang; Tengchao Lv; Lei Cui; Qifeng Chen; Furu Wei

arXiv:2311.16465·cs.CV·November 29, 2023·2 cites

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

PDF

Open Access 2 Models

TL;DR

TextDiffuser-2 leverages large language models within diffusion frameworks to improve text rendering by enabling automated layout planning, diverse style generation, and flexible modifications, validated through extensive experiments and user studies.

Contribution

The paper introduces a novel method that integrates large language models with diffusion models for automated, flexible, and diverse text rendering.

Findings

01

Achieves more rational text layouts and diverse styles

02

Outperforms previous methods in flexibility and automation

03

Validated by human and GPT-4V user studies

Abstract

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of layout prediction, and restricted style diversity. In this paper, we present TextDiffuser-2, aiming to unleash the power of language models for text rendering. Firstly, we fine-tune a large language model for layout planning. The large language model is capable of automatically generating keywords for text rendering and also supports layout modification through chatting. Secondly, we utilize the language model within the diffusion model to encode the position and texts at the line level. Unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation

MethodsDiffusion