Scene Text Image Super-resolution based on Text-conditional Diffusion Models
Chihiro Noguchi, Shun Fukuda, Masao Yamanaka

TL;DR
This paper introduces a novel scene text image super-resolution framework using text-conditional diffusion models, significantly improving image quality and dataset synthesis for better scene text recognition.
Contribution
It leverages text-conditional diffusion models for super-resolution and dataset synthesis, surpassing existing methods and enhancing STISR performance.
Findings
Text-conditional DMs outperform existing STISR methods.
Synthesized LR-HR image pairs improve STISR training.
Proposed framework enhances scene text recognition accuracy.
Abstract
Scene Text Image Super-resolution (STISR) has recently achieved great success as a preprocessing method for scene text recognition. STISR aims to transform blurred and noisy low-resolution (LR) text images in real-world settings into clear high-resolution (HR) text images suitable for scene text recognition. In this study, we leverage text-conditional diffusion models (DMs), known for their impressive text-to-image synthesis capabilities, for STISR tasks. Our experimental results revealed that text-conditional DMs notably surpass existing STISR methods. Especially when texts from LR text images are given as input, the text-conditional DMs are able to produce superior quality super-resolution text images. Utilizing this capability, we propose a novel framework for synthesizing LR-HR paired text image datasets. This framework consists of three specialized text-conditional DMs, each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Scene Text Image Super-Resolution Based on Text-Conditional Diffusion Models· youtube
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods
MethodsDiffusion
