TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image   Super-Resolution

Baolin Liu; Zongyuan Yang; Pengfei Wang; Junjie Zhou; Ziqi; Liu; Ziyi Song; Yan Liu; Yongping Xiong

arXiv:2308.06743·cs.CV·March 18, 2025·1 cites

TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution

Baolin Liu, Zongyuan Yang, Pengfei Wang, Junjie Zhou, Ziqi, Liu, Ziyi Song, Yan Liu, Yongping Xiong

PDF

Open Access 1 Repo

TL;DR

TextDiff introduces a diffusion-based framework with a mask-guided residual module for scene text image super-resolution, significantly enhancing text clarity and recognition without extra training.

Contribution

The paper presents the first diffusion-based approach for scene text super-resolution, featuring a novel residual diffusion module that sharpens text edges effectively.

Findings

01

Achieves state-of-the-art performance on benchmark datasets.

02

Effectively sharpens text edges without additional joint training.

03

Improves readability and recognizability of scene text images.

Abstract

The goal of scene text image super-resolution is to reconstruct high-resolution text-line images from unrecognizable low-resolution inputs. The existing methods relying on the optimization of pixel-level loss tend to yield text edges that exhibit a notable degree of blurring, thereby exerting a substantial impact on both the readability and recognizability of the text. To address these issues, we propose TextDiff, the first diffusion-based framework tailored for scene text image super-resolution. It contains two modules: the Text Enhancement Module (TEM) and the Mask-Guided Residual Diffusion Module (MRD). The TEM generates an initial deblurred text image and a mask that encodes the spatial location of the text. The MRD is responsible for effectively sharpening the text edge by modeling the residuals between the ground-truth images and the initial deblurred images. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lenubolim/textdiff
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods

MethodsDiffusion