LEGO: Self-Supervised Representation Learning for Scene Text Images
Yujin Ren, Jiaxin Zhang, Lianwen Jin

TL;DR
LEGO introduces a self-supervised learning approach tailored for scene text images, leveraging unlabeled real data and novel pre-text tasks to improve recognition accuracy and generalization across multiple benchmarks.
Contribution
The paper proposes LEGO, a self-supervised method with three novel pre-text tasks designed specifically for scene text images, addressing the limitations of generic self-supervised methods.
Findings
LEGO outperforms previous self-supervised methods in scene text recognition.
The pre-trained model achieves state-of-the-art or comparable results on six benchmarks.
LEGO also improves performance in other text-related tasks.
Abstract
In recent years, significant progress has been made in scene text recognition by data-driven methods. However, due to the scarcity of annotated real-world data, the training of these methods predominantly relies on synthetic data. The distribution gap between synthetic and real data constrains the further performance improvement of these methods in real-world applications. To tackle this problem, a highly promising approach is to utilize massive amounts of unlabeled real data for self-supervised training, which has been widely proven effective in many NLP and CV tasks. Nevertheless, generic self-supervised methods are unsuitable for scene text images due to their sequential nature. To address this issue, we propose a Local Explicit and Global Order-aware self-supervised representation learning method (LEGO) that accounts for the characteristics of scene text images. Inspired by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction
