Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene   Text Understanding: Cropped Word Recognition

Zhangzi Zhu; Yu Hao; Wenqing Zhang; Chuhui Xue; Song Bai

arXiv:2208.02747·cs.CV·September 1, 2022

Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

Zhangzi Zhu, Yu Hao, Wenqing Zhang, Chuhui Xue, Song Bai

PDF

Open Access

TL;DR

This paper describes a top-performing solution for recognizing out-of-vocabulary scene text in natural images, combining multiple models trained with data augmentation and ensemble techniques to improve accuracy.

Contribution

The authors propose a multi-model ensemble approach with specialized training for different text types, achieving high accuracy in out-of-vocabulary scene text recognition.

Findings

01

Achieved 59.45% word accuracy on OOV words

02

Effective use of synthetic pre-training and data augmentation

03

Ensemble of diverse models improves recognition performance

Abstract

This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves a word accuracy of 59.45\% when considering out-of-vocabulary words only.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques