Relational Contrastive Learning for Scene Text Recognition
Jinglei Zhang, Tiancheng Lin, Yi Xu, Kai Chen, Rui Zhang

TL;DR
This paper introduces RCLSTR, a self-supervised learning framework that enhances scene text recognition by enriching textual relations through rearrangement, hierarchy, and interaction, improving robustness and outperforming existing methods.
Contribution
The paper proposes a novel relational contrastive learning framework that addresses overfitting in scene text recognition by enriching textual relations and theoretically guarantees robustness.
Findings
Outperforms state-of-the-art self-supervised STR methods
Enriching textual relations improves representation robustness
Theoretical analysis confirms bias suppression
Abstract
Context-aware methods achieved great success in supervised scene text recognition via incorporating semantic priors from words. We argue that such prior contextual information can be interpreted as the relations of textual primitives due to the heterogeneous text and background, which can provide effective self-supervised labels for representation learning. However, textual relations are restricted to the finite size of dataset due to lexical dependencies, which causes the problem of over-fitting and compromises representation robustness. To this end, we propose to enrich the textual relations via rearrangement, hierarchy and interaction, and design a unified framework called RCLSTR: Relational Contrastive Learning for Scene Text Recognition. Based on causality, we theoretically explain that three modules suppress the bias caused by the contextual prior and thus guarantee representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis
MethodsContrastive Learning
