Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin, Luo, Qi Tian, Xiang Bai

TL;DR
This paper introduces a self-supervised text recognition approach that combines contrastive learning and masked image modeling to improve recognition accuracy on real-world datasets, inspired by human reading and writing behaviors.
Contribution
It is the first to integrate contrastive learning and masked image modeling for self-supervised text recognition, enhancing performance on irregular scene text datasets.
Findings
Outperforms previous self-supervised methods by 10.2%-20.2% on irregular datasets.
Surpasses state-of-the-art methods by an average of 5.3% across 11 benchmarks.
Pre-trained model benefits other text-related tasks with significant performance gains.
Abstract
Existing text recognition methods usually need large-scale training data. Most of them rely on synthetic training data due to the lack of annotated real images. However, there is a domain gap between the synthetic data and real data, which limits the performance of the text recognition models. Recent self-supervised text recognition methods attempted to utilize unlabeled real images by introducing contrastive learning, which mainly learns the discrimination of the text images. Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method. The contrastive learning branch is adopted to learn the discrimination of text images, which imitates the reading behavior of humans. Meanwhile, masked image modeling is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Domain Adaptation and Few-Shot Learning · Text and Document Classification Technologies
MethodsContrastive Learning
