Scene Text Image Super-Resolution via Content Perceptual Loss and Criss-Cross Transformer Blocks
Rui Qin, Bin Wang, Yu-Wing Tai

TL;DR
This paper introduces TATSR, a novel text super-resolution framework utilizing Criss-Cross Transformer Blocks and Content Perceptual Loss to improve text readability and recognition across multiple languages.
Contribution
The paper proposes a new framework with orthogonal transformer-based content extraction and a content-aware loss, enhancing text super-resolution performance and generalizability.
Findings
Outperforms state-of-the-art methods in recognition accuracy
Improves human perception of reconstructed text images
Effective across multiple languages
Abstract
Text image super-resolution is a unique and important task to enhance readability of text images to humans. It is widely used as pre-processing in scene text recognition. However, due to the complex degradation in natural scenes, recovering high-resolution texts from the low-resolution inputs is ambiguous and challenging. Existing methods mainly leverage deep neural networks trained with pixel-wise losses designed for natural image reconstruction, which ignore the unique character characteristics of texts. A few works proposed content-based losses. However, they only focus on text recognizers' accuracy, while the reconstructed images may still be ambiguous to humans. Further, they often have weak generalizability to handle cross languages. To this end, we present TATSR, a Text-Aware Text Super-Resolution framework, which effectively learns the unique text characteristics using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques
MethodsMulti-Head Attention · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Adam · Dense Connections
