Layout Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai,, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan

TL;DR
SceneTextGen is a diffusion-based model that generates scene text images without relying on predefined layouts, enabling more diverse and natural text representations with improved recognition accuracy.
Contribution
It introduces a layout-agnostic diffusion model with character-level encoding and segmentation components for more varied and accurate scene text image synthesis.
Findings
Improved character recognition rates on multiple datasets
Enhanced diversity in text styles and fonts
Outperforms standard diffusion and text-specific methods
Abstract
While diffusion models have significantly advanced the quality of image generation their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene text generation are typically limited by their reliance on an intermediate layout output. This dependency often results in a constrained diversity of text styles and fonts an inherent limitation stemming from the deterministic nature of the layout generation phase. To address these challenges this paper introduces SceneTextGen a novel diffusion-based model specifically designed to circumvent the need for a predefined layout stage. By doing so SceneTextGen facilitates a more natural and varied representation of text. The novelty of SceneTextGen lies in its integration of three key components: a character-level encoder for capturing detailed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction
MethodsDiffusion
