EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

Runnan Lu; Yuxuan Zhang; Jiaming Liu; Haofan Wang; Yiren Song

arXiv:2505.24417·cs.CV·March 11, 2026

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering

Runnan Lu, Yuxuan Zhang, Jiaming Liu, Haofan Wang, Yiren Song

PDF

Open Access 1 Repo 1 Video

TL;DR

EasyText introduces a diffusion transformer-based framework for controllable, high-quality multilingual text rendering, leveraging large-scale synthetic datasets and novel encoding techniques to improve accuracy and layout control.

Contribution

The paper presents EasyText, a novel diffusion transformer framework with character positioning encoding and interpolation for precise multilingual text rendering.

Findings

01

Effective multilingual text rendering demonstrated

02

High visual quality and layout control achieved

03

Large-scale synthetic datasets enhance training

Abstract

Generating accurate multilingual text with diffusion models has long been desired but remains challenging. Recent methods have made progress in rendering text in a single language, but rendering arbitrary languages is still an unexplored area. This paper introduces EasyText, a text rendering framework based on DiT (Diffusion Transformer), which connects denoising latents with multilingual character tokens encoded as character tokens. We propose character positioning encoding and position encoding interpolation techniques to achieve controllable and precise text rendering. Additionally, we construct a large-scale synthetic text image dataset with 1 million multilingual image-text annotations as well as a high-quality dataset of 20K annotated images, which are used for pretraining and fine-tuning respectively. Extensive experiments and evaluations demonstrate the effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

songyiren725/easytext
pytorchOfficial

Videos

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering· underline

Taxonomy

TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · Video Analysis and Summarization

MethodsDiffusion