TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance

Keren Ye; Ignacio Garcia Dorado; Michalis Raptis; Mauricio Delbracio; Irene Zhu; Peyman Milanfar; Hossein Talebi

arXiv:2505.23119·cs.CV·May 30, 2025

TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance

Keren Ye, Ignacio Garcia Dorado, Michalis Raptis, Mauricio Delbracio, Irene Zhu, Peyman Milanfar, Hossein Talebi

PDF

TL;DR

TextSR is a multimodal diffusion model that improves multilingual scene text image super-resolution by integrating OCR-guided text priors, leading to more accurate and legible text reconstruction in challenging images.

Contribution

The paper introduces TextSR, a novel diffusion-based super-resolution model that incorporates OCR and text priors to enhance multilingual scene text image quality.

Findings

01

Outperforms existing methods on TextZoom and TextVQA datasets

02

Effectively localizes text regions and models multilingual character shapes

03

Enhances text legibility and reduces hallucinated textures

Abstract

While recent advancements in Image Super-Resolution (SR) using diffusion models have shown promise in improving overall image quality, their application to scene text images has revealed limitations. These models often struggle with accurate text region localization and fail to effectively model image and multilingual character-to-shape priors. This leads to inconsistencies, the generation of hallucinated textures, and a decrease in the perceived quality of the super-resolved text. To address these issues, we introduce TextSR, a multimodal diffusion model specifically designed for Multilingual Scene Text Image Super-Resolution. TextSR leverages a text detector to pinpoint text regions within an image and then employs Optical Character Recognition (OCR) to extract multilingual text from these areas. The extracted text characters are then transformed into visual shapes using a UTF-8…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion