Self-Supervised Pre-training with Symmetric Superimposition Modeling for   Scene Text Recognition

Zuan Gao; Yuxin Wang; Yadong Qu; Boqiang Zhang; Zixiao Wang; Jianjun; Xu; Hongtao Xie

arXiv:2405.05841·cs.CV·May 14, 2024

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

Zuan Gao, Yuxin Wang, Yadong Qu, Boqiang Zhang, Zixiao Wang, Jianjun, Xu, Hongtao Xie

PDF

Open Access 1 Repo

TL;DR

This paper introduces Symmetric Superimposition Modeling (SSM), a self-supervised pre-training method that captures both local character features and linguistic information in scene text recognition, leading to significant performance improvements.

Contribution

The novel SSM approach models both pixel-level and feature-level linguistic information using symmetric superimposition, enhancing text recognition accuracy without reliance on extensive annotations.

Findings

01

Achieves 4.1% average performance gain on benchmarks.

02

Sets new state-of-the-art with 86.6% average word accuracy.

03

Demonstrates effectiveness and generality across various datasets.

Abstract

In text recognition, self-supervised pre-training emerges as a good solution to reduce dependence on expansive annotated real data. Previous studies primarily focus on local visual representation by leveraging mask image modeling or sequence contrastive learning. However, they omit modeling the linguistic information in text images, which is crucial for recognizing text. To simultaneously capture local character features and linguistic information in visual space, we propose Symmetric Superimposition Modeling (SSM). The objective of SSM is to reconstruct the direction-specific pixel and feature signals from the symmetrically superimposed input. Specifically, we add the original image with its inverted views to create the symmetrically superimposed inputs. At the pixel level, we reconstruct the original and inverted images to capture character shapes and texture-level linguistic context.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

faltingsa/ssm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction

MethodsFocus