JoyType: A Robust Design for Multilingual Visual Text Creation

Chao Li; Chen Jiang; Xiaolong Liu; Jun Zhao; Guoxin Wang

arXiv:2409.17524·cs.CV·March 31, 2025

JoyType: A Robust Design for Multilingual Visual Text Creation

Chao Li, Chen Jiang, Xiaolong Liu, Jun Zhao, Guoxin Wang

PDF

Open Access

TL;DR

JoyType is a new multilingual visual text creation model that maintains font styles, including small fonts, during image generation by using a specialized control network and OCR-aware loss, outperforming existing methods.

Contribution

Introduces JoyType, a novel approach with a large dataset and a font control network, enhancing font style preservation in diffusion-based image generation.

Findings

01

Outperforms state-of-the-art methods in font style accuracy

02

Effectively generates small-font text in images

03

Can be integrated as a plugin with other diffusion models

Abstract

Generating images with accurately represented text, especially in non-Latin languages, poses a significant challenge for diffusion models. Existing approaches, such as the integration of hint condition diagrams via auxiliary networks (e.g., ControlNet), have made strides towards addressing this issue. However, diffusion models often fall short in tasks requiring controlled text generation, such as specifying particular fonts or producing text in small fonts. In this paper, we introduce a novel approach for multilingual visual text creation, named JoyType, designed to maintain the font style of text during the image generation process. Our methodology begins with assembling a training dataset, JoyType-1M, comprising 1 million pairs of data. Each pair includes an image, its description, and glyph instructions corresponding to the font style within the image. We then developed a text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Storytelling and Education

MethodsDiffusion · Hierarchical Information Threading