Utilizing Resource-Rich Language Datasets for End-to-End Scene Text   Recognition in Resource-Poor Languages

Shota Orihashi; Yoshihiro Yamazaki; Naoki Makishima; Mana Ihori,; Akihiko Takashima; Tomohiro Tanaka; Ryo Masumura

arXiv:2111.12276·cs.CV·November 25, 2021

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori,, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura

PDF

TL;DR

This paper introduces a training approach that leverages resource-rich language datasets to improve end-to-end scene text recognition in resource-poor languages, using a multilingual encoder and specialized decoder.

Contribution

It proposes a novel multilingual training method that enhances scene text recognition in resource-poor languages by utilizing resource-rich datasets for encoder pre-training.

Findings

01

Effective recognition in Japanese scene text with limited data

02

Multilingual encoder captures language-invariant features

03

Decoder fine-tuned for resource-poor language

Abstract

This paper presents a novel training method for end-to-end scene text recognition. End-to-end scene text recognition offers high recognition accuracy, especially when using the encoder-decoder model based on Transformer. To train a highly accurate end-to-end model, we need to prepare a large image-to-text paired dataset for the target language. However, it is difficult to collect this data, especially for resource-poor languages. To overcome this difficulty, our proposed method utilizes well-prepared large datasets in resource-rich languages such as English, to train the resource-poor encoder-decoder model. Our key idea is to build a model in which the encoder reflects knowledge of multiple languages while the decoder specializes in knowledge of just the resource-poor language. To this end, the proposed method pre-trains the encoder by using a multilingual dataset that combines the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections · Softmax · Residual Connection · Adam