Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Zuan Gao, Yuxin Wang, Yadong Qu, Boqiang Zhang, Zixiao Wang, Jianjun, Xu, Hongtao Xie

TL;DR
This paper introduces Symmetric Superimposition Modeling (SSM), a self-supervised pre-training method that captures both local character features and linguistic information in scene text recognition, leading to significant performance improvements.
Contribution
The novel SSM approach models both pixel-level and feature-level linguistic information using symmetric superimposition, enhancing text recognition accuracy without reliance on extensive annotations.
Findings
Achieves 4.1% average performance gain on benchmarks.
Sets new state-of-the-art with 86.6% average word accuracy.
Demonstrates effectiveness and generality across various datasets.
Abstract
In text recognition, self-supervised pre-training emerges as a good solution to reduce dependence on expansive annotated real data. Previous studies primarily focus on local visual representation by leveraging mask image modeling or sequence contrastive learning. However, they omit modeling the linguistic information in text images, which is crucial for recognizing text. To simultaneously capture local character features and linguistic information in visual space, we propose Symmetric Superimposition Modeling (SSM). The objective of SSM is to reconstruct the direction-specific pixel and feature signals from the symmetrically superimposed input. Specifically, we add the original image with its inverted views to create the symmetrically superimposed inputs. At the pixel level, we reconstruct the original and inverted images to capture character shapes and texture-level linguistic context.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
MethodsFocus
