One Model for Two Tasks: Cooperatively Recognizing and Recovering   Low-Resolution Scene Text Images by Iterative Mutual Guidance

Minyi Zhao; Yang Wang; Jihong Guan; Shuigeng Zhou

arXiv:2409.14483·cs.CV·September 24, 2024·2 cites

One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance

Minyi Zhao, Yang Wang, Jihong Guan, Shuigeng Zhou

PDF

Open Access

TL;DR

This paper introduces IMAGE, a novel iterative mutual guidance framework that simultaneously improves low-resolution scene text recognition and super-resolution fidelity by enabling high-level semantic and low-level pixel information exchange.

Contribution

The paper proposes a new method that separately optimizes recognition and super-resolution models with an iterative guidance mechanism for enhanced performance.

Findings

01

Outperforms existing methods in recognition accuracy on LR datasets

02

Achieves higher super-resolution fidelity compared to prior approaches

03

Demonstrates effective mutual guidance between recognition and super-resolution models

Abstract

Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging due to insufficient visual information. Therefore, recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance. Nevertheless, these methods have two major weaknesses. On the one hand, STISR approaches may generate imperfect or even erroneous SR images, which mislead the subsequent recognition of STR models. On the other hand, as the STISR and STR models are jointly optimized, to pursue high recognition accuracy, the fidelity of SR images may be spoiled. As a result, neither the recognition performance nor the fidelity of STISR models are desirable. Then, can we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction