DiffusionSTR: Diffusion Model for Scene Text Recognition

Masato Fujitake

arXiv:2306.16707·cs.CV·June 30, 2023

DiffusionSTR: Diffusion Model for Scene Text Recognition

Masato Fujitake

PDF

Open Access

TL;DR

This paper introduces DiffusionSTR, a novel scene text recognition framework that applies diffusion models to recognize text in images, achieving competitive accuracy with existing methods.

Contribution

It is the first to apply diffusion models to scene text recognition, rethinking the task as a text-text transformation under images.

Findings

01

Achieves competitive accuracy on public datasets

02

First application of diffusion models to text recognition

03

Reframes scene text recognition as a diffusion-based text-to-text task

Abstract

This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Image Retrieval and Classification Techniques · Handwritten Text Recognition Techniques

MethodsDiffusion