What If We Only Use Real Datasets for Scene Text Recognition? Toward   Scene Text Recognition With Fewer Labels

Jeonghun Baek; Yusuke Matsui; Kiyoharu Aizawa

arXiv:2103.04400·cs.CV·June 8, 2021·6 cites

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that scene text recognition models can be effectively trained using only real labeled data, supplemented with data augmentation and semi/self-supervised learning, challenging the belief that synthetic data is essential.

Contribution

It is the first to show competitive STR performance with only real data and introduces semi- and self-supervised methods into this setting.

Findings

01

Models trained on real data alone achieve competitive accuracy.

02

Data augmentation and semi/self-supervised methods significantly improve performance.

03

The study challenges the necessity of synthetic data for STR training.

Abstract

Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other than English for which we do not always have synthetic data. However, there has been implicit common knowledge that training STR models on real data is nearly impossible because real data is insufficient. We consider that this common knowledge has obstructed the study of STR with fewer labels. In this work, we would like to reactivate STR with fewer labels by disproving the common knowledge. We consolidate recently accumulated public real data and show that we can train STR models satisfactorily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ku21fan/STR-Fewer-Labels
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications