Read Like Humans: Autonomous, Bidirectional and Iterative Language   Modeling for Scene Text Recognition

Shancheng Fang; Hongtao Xie; Yuxin Wang; Zhendong Mao; Yongdong Zhang

arXiv:2103.06495·cs.CV·March 12, 2021·21 cites

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, Yongdong Zhang

PDF

Open Access 4 Repos

TL;DR

This paper introduces ABINet, a novel scene text recognition model that explicitly models language bidirectionally and iteratively, significantly improving recognition accuracy especially on low-quality images.

Contribution

The paper proposes ABINet, which incorporates explicit language modeling, bidirectional feature representation, and iterative correction, advancing scene text recognition beyond implicit, unidirectional, and noise-sensitive models.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Excels in recognizing low-quality and noisy images.

03

Self-training enhances learning from unlabeled data.

Abstract

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications