First-place Solution for Streetscape Shop Sign Recognition Competition

Bin Wang; Li Jing

arXiv:2501.02811·cs.CV·April 23, 2025

First-place Solution for Streetscape Shop Sign Recognition Competition

Bin Wang, Li Jing

PDF

Open Access

TL;DR

This paper presents a top solution for street-view sign recognition, combining multimodal features, self-supervised learning, and Transformer models to improve accuracy in complex urban scenes.

Contribution

The paper introduces a novel multistage approach integrating multimodal feature fusion, self-supervised training, and Transformer-based models for street sign recognition.

Findings

01

Achieved state-of-the-art accuracy in sign recognition tasks.

02

Demonstrated effectiveness of reinforcement learning and text rectification techniques.

03

Validated methods through comprehensive experiments.

Abstract

Text recognition technology applied to street-view storefront signs is increasingly utilized across various practical domains, including map navigation, smart city planning analysis, and business value assessments in commercial districts. This technology holds significant research and commercial potential. Nevertheless, it faces numerous challenges. Street view images often contain signboards with complex designs and diverse text styles, complicating the text recognition process. A notable advancement in this field was introduced by our team in a recent competition. We developed a novel multistage approach that integrates multimodal feature fusion, extensive self-supervised training, and a Transformer-based large model. Furthermore, innovative techniques such as BoxDQN, which relies on reinforcement learning, and text rectification methods were employed, leading to impressive outcomes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies