Text Recognition in Real Scenarios with a Few Labeled Samples

Jinghuang Lin; Zhanzhan Cheng; Fan Bai; Yi Niu; Shiliang Pu; Shuigeng; Zhou

arXiv:2006.12209·cs.CV·June 23, 2020

Text Recognition in Real Scenarios with a Few Labeled Samples

Jinghuang Lin, Zhanzhan Cheng, Fan Bai, Yi Niu, Shiliang Pu, Shuigeng, Zhou

PDF

Open Access

TL;DR

This paper introduces a few-shot adversarial sequence domain adaptation method for scene text recognition, enabling high accuracy with limited labeled target samples by aligning synthetic and real data at the character level.

Contribution

The proposed FASDA approach effectively adapts synthetic models to real-world scenarios with minimal labeled data, outperforming finetuning and matching state-of-the-art methods.

Findings

01

Significantly outperforms finetuning schemes.

02

Achieves comparable performance to state-of-the-art STR methods.

03

Effective in scenarios with scarce labeled data.

Abstract

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each character's feature representation with an attention mechanism and establishing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction