LEGO: Self-Supervised Representation Learning for Scene Text Images

Yujin Ren; Jiaxin Zhang; Lianwen Jin

arXiv:2408.02036·cs.CV·August 6, 2024

LEGO: Self-Supervised Representation Learning for Scene Text Images

Yujin Ren, Jiaxin Zhang, Lianwen Jin

PDF

Open Access

TL;DR

LEGO introduces a self-supervised learning approach tailored for scene text images, leveraging unlabeled real data and novel pre-text tasks to improve recognition accuracy and generalization across multiple benchmarks.

Contribution

The paper proposes LEGO, a self-supervised method with three novel pre-text tasks designed specifically for scene text images, addressing the limitations of generic self-supervised methods.

Findings

01

LEGO outperforms previous self-supervised methods in scene text recognition.

02

The pre-trained model achieves state-of-the-art or comparable results on six benchmarks.

03

LEGO also improves performance in other text-related tasks.

Abstract

In recent years, significant progress has been made in scene text recognition by data-driven methods. However, due to the scarcity of annotated real-world data, the training of these methods predominantly relies on synthetic data. The distribution gap between synthetic and real data constrains the further performance improvement of these methods in real-world applications. To tackle this problem, a highly promising approach is to utilize massive amounts of unlabeled real data for self-supervised training, which has been widely proven effective in many NLP and CV tasks. Nevertheless, generic self-supervised methods are unsuitable for scene text images due to their sequential nature. To address this issue, we propose a Local Explicit and Global Order-aware self-supervised representation learning method (LEGO) that accounts for the characteristics of scene text images. Inspired by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction