Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching
Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu

TL;DR
This paper introduces a training protocol search method that optimizes hyper-parameters to significantly improve scene text recognition accuracy and speed, outperforming existing models.
Contribution
We propose a novel training protocol search algorithm using evolutionary optimization and proxy tasks, enhancing STR model performance and efficiency.
Findings
Improved recognition accuracy by 2.7% to 3.9% on mainstream STR models.
Achieved 2.1% higher accuracy with TRBA-Net compared to state-of-the-art.
Faster inference speeds on CPU and GPU with the searched training protocol.
Abstract
The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models. However, training protocol (i.e., settings of the hyper-parameters involved in the training of STR models), which plays an equally important role in successfully training a good STR model, is under-explored for scene text recognition. In this work, we attempt to improve the accuracy of existing STR models by searching for optimal training protocol. Specifically, we develop a training protocol search algorithm, based on a newly designed search space and an efficient search algorithm using evolutionary optimization and proxy tasks. Experimental results show that our searched training protocol can improve the recognition accuracy of mainstream STR models by 2.7%~3.9%. In particular, with the searched training protocol, TRBA-Net achieves 2.1% higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Machine Learning and Data Classification · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
