First-place Solution for Streetscape Shop Sign Recognition Competition
Bin Wang, Li Jing

TL;DR
This paper presents a top solution for street-view sign recognition, combining multimodal features, self-supervised learning, and Transformer models to improve accuracy in complex urban scenes.
Contribution
The paper introduces a novel multistage approach integrating multimodal feature fusion, self-supervised training, and Transformer-based models for street sign recognition.
Findings
Achieved state-of-the-art accuracy in sign recognition tasks.
Demonstrated effectiveness of reinforcement learning and text rectification techniques.
Validated methods through comprehensive experiments.
Abstract
Text recognition technology applied to street-view storefront signs is increasingly utilized across various practical domains, including map navigation, smart city planning analysis, and business value assessments in commercial districts. This technology holds significant research and commercial potential. Nevertheless, it faces numerous challenges. Street view images often contain signboards with complex designs and diverse text styles, complicating the text recognition process. A notable advancement in this field was introduced by our team in a recent competition. We developed a novel multistage approach that integrates multimodal feature fusion, extensive self-supervised training, and a Transformer-based large model. Furthermore, innovative techniques such as BoxDQN, which relies on reinforcement learning, and text rectification methods were employed, leading to impressive outcomes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies
