Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
Yadong Qu, Yuxin Wang, Bangbang Zhou, Zixiao Wang, Hongtao Xie,, Yongdong Zhang

TL;DR
This paper introduces a novel semi-supervised scene text recognition framework inspired by viewing and summarizing human learning processes, utilizing contrastive learning, synthetic data augmentation, and a new alignment loss to improve recognition of challenging texts.
Contribution
It proposes a self-motivated contrastive learning framework with an online generation strategy and a character unidirectional alignment loss to enhance character morphology understanding and recognition accuracy.
Findings
Achieves state-of-the-art accuracy on benchmark datasets.
Effectively enriches character morphology diversity.
Improves recognition of complex and artistic texts.
Abstract
Existing scene text recognition (STR) methods struggle to recognize challenging texts, especially for artistic and severely distorted characters. The limitation lies in the insufficient exploration of character morphologies, including the monotonousness of widely used synthetic training data and the sensitivity of the model to character morphologies. To address these issues, inspired by the human learning process of viewing and summarizing, we facilitate the contrastive learning-based STR framework in a self-motivated manner by leveraging synthetic and real unlabeled data without any human cost. In the viewing process, to compensate for the simplicity of synthetic data and enrich character morphology diversity, we propose an Online Generation Strategy to generate background-free samples with diverse character styles. By excluding background noise distractions, the model is encouraged to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText and Document Classification Technologies · Handwritten Text Recognition Techniques · Advanced Text Analysis Techniques
MethodsFocus
