SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin,, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin

TL;DR
SwinTextSpotter is a novel end-to-end scene text spotting framework that enhances the synergy between detection and recognition using a transformer-based approach and a Recognition Conversion mechanism, achieving superior performance without extra modules.
Contribution
The paper introduces a new unified framework with a Recognition Conversion mechanism that explicitly guides text localization through recognition loss, improving over simple backbone sharing methods.
Findings
Outperforms existing methods on multiple datasets
Does not require additional rectification modules
Handles arbitrarily-shaped and multi-lingual text effectively
Abstract
End-to-end scene text spotting has attracted great attention in recent years due to the success of excavating the intrinsic synergy of the scene text detection and recognition. However, recent state-of-the-art methods usually incorporate detection and recognition simply by sharing the backbone, which does not directly take advantage of the feature interaction between the two tasks. In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter. Using a transformer encoder with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism to explicitly guide text localization through recognition loss. The straightforward design results in a concise framework that requires neither additional rectification module nor character-level annotation for the arbitrarily-shaped text. Qualitative and quantitative experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Speech Recognition and Synthesis · Hand Gesture Recognition Systems
