Training Protocol Matters: Towards Accurate Scene Text Recognition via   Training Protocol Searching

Xiaojie Chu; Yongtao Wang; Chunhua Shen; Jingdong Chen; Wei Chu

arXiv:2203.06696·cs.CV·March 18, 2022·1 cites

Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching

Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu

PDF

Open Access 2 Repos

TL;DR

This paper introduces a training protocol search method that optimizes hyper-parameters to significantly improve scene text recognition accuracy and speed, outperforming existing models.

Contribution

We propose a novel training protocol search algorithm using evolutionary optimization and proxy tasks, enhancing STR model performance and efficiency.

Findings

01

Improved recognition accuracy by 2.7% to 3.9% on mainstream STR models.

02

Achieved 2.1% higher accuracy with TRBA-Net compared to state-of-the-art.

03

Faster inference speeds on CPU and GPU with the searched training protocol.

Abstract

The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models. However, training protocol (i.e., settings of the hyper-parameters involved in the training of STR models), which plays an equally important role in successfully training a good STR model, is under-explored for scene text recognition. In this work, we attempt to improve the accuracy of existing STR models by searching for optimal training protocol. Specifically, we develop a training protocol search algorithm, based on a newly designed search space and an efficient search algorithm using evolutionary optimization and proxy tasks. Experimental results show that our searched training protocol can improve the recognition accuracy of mainstream STR models by 2.7%~3.9%. In particular, with the searched training protocol, TRBA-Net achieves 2.1% higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Machine Learning and Data Classification · Advanced Neural Network Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings