Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia

TL;DR
This paper introduces LEMMA, a novel scene text image super-resolution method that explicitly models character regions and employs multi-modal alignment to improve image quality and recognition accuracy, outperforming existing methods.
Contribution
The paper proposes a new approach that explicitly models character regions and uses multi-modal alignment for enhanced scene text image super-resolution.
Findings
LEMMA outperforms state-of-the-art methods on TextZoom and four recognition benchmarks.
The location enhancement module effectively extracts character region features.
The adaptive fusion module improves the integration of visual and semantic guidance.
Abstract
Scene text image super-resolution (STISR), aiming to improve image quality while boosting downstream scene text recognition accuracy, has recently achieved great success. However, most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process, and neglect the disturbance from the complex background, thus limiting the performance. To address these issues, in this paper, we propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution. To model the location of characters effectively, we propose the location enhancement module to extract character region features based on the attention map sequence. Besides, we propose the multi-modal alignment module to perform bidirectional visual-semantic alignment to generate high-quality prior guidance, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Simulation and Modeling Applications · Image and Signal Denoising Methods
