DiffusionSTR: Diffusion Model for Scene Text Recognition
Masato Fujitake

TL;DR
This paper introduces DiffusionSTR, a novel scene text recognition framework that applies diffusion models to recognize text in images, achieving competitive accuracy with existing methods.
Contribution
It is the first to apply diffusion models to scene text recognition, rethinking the task as a text-text transformation under images.
Findings
Achieves competitive accuracy on public datasets
First application of diffusion models to text recognition
Reframes scene text recognition as a diffusion-based text-to-text task
Abstract
This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Image Retrieval and Classification Techniques · Handwritten Text Recognition Techniques
MethodsDiffusion
