Towards Robust Scene Text Image Super-resolution via Explicit Location   Enhancement

Hang Guo; Tao Dai; Guanghao Meng; Shu-Tao Xia

arXiv:2307.09749·cs.CV·August 1, 2023·1 cites

Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement

Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia

PDF

Open Access 1 Repo

TL;DR

This paper introduces LEMMA, a novel scene text image super-resolution method that explicitly models character regions and employs multi-modal alignment to improve image quality and recognition accuracy, outperforming existing methods.

Contribution

The paper proposes a new approach that explicitly models character regions and uses multi-modal alignment for enhanced scene text image super-resolution.

Findings

01

LEMMA outperforms state-of-the-art methods on TextZoom and four recognition benchmarks.

02

The location enhancement module effectively extracts character region features.

03

The adaptive fusion module improves the integration of visual and semantic guidance.

Abstract

Scene text image super-resolution (STISR), aiming to improve image quality while boosting downstream scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, and neglect the disturbance from the complex background, thus limiting the performance. To address these issues, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on the attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csguoh/lemma
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Simulation and Modeling Applications · Image and Signal Denoising Methods