TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis

Kazi Mahathir Rahman; Showrin Rahman; Sharmin Sultana Srishty

arXiv:2505.19291·cs.CV·November 11, 2025

TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis

Kazi Mahathir Rahman, Showrin Rahman, Sharmin Sultana Srishty

PDF

Open Access

TL;DR

This paper introduces TextDiffuser-RL, a two-stage text layout optimization framework that uses reinforcement learning to produce high-quality, high-fidelity text-to-image synthesis efficiently on both CPU and GPU platforms.

Contribution

It presents a novel RL-based text layout generation method integrated with diffusion models, significantly improving efficiency and flexibility over existing approaches.

Findings

01

Achieves comparable image quality to TextDiffuser-2.

02

Runs 42.29 times faster than previous methods.

03

Requires only 2 MB of CPU RAM for inference.

Abstract

Text-embedded image generation plays a critical role in industries such as graphic design, advertising, and digital content creation. Text-to-Image generation methods leveraging diffusion models, such as TextDiffuser-2, have demonstrated promising results in producing images with embedded text. TextDiffuser-2 effectively generates bounding box layouts that guide the rendering of visual text, achieving high fidelity and coherence. However, existing approaches often rely on resource-intensive processes and are limited in their ability to run efficiently on both CPU and GPU platforms. To address these challenges, we propose a novel two-stage pipeline that integrates reinforcement learning (RL) for rapid and optimized text layout generation with a diffusion-based image synthesis model. Our RL-based approach significantly accelerates the bounding box prediction step while reducing overlaps,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Handwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques

MethodsDiffusion