Scene Text Recognition from Two-Dimensional Perspective
Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang,, Pengyuan Lyu, Cong Yao, Xiang Bai

TL;DR
This paper introduces CA-FCN, a two-dimensional fully convolutional network for scene text recognition that effectively handles arbitrary-shaped text by leveraging semantic segmentation and attention mechanisms, outperforming previous methods.
Contribution
The paper proposes a novel 2D perspective for scene text recognition using CA-FCN, which improves accuracy and robustness over traditional sequence-based methods.
Findings
Outperforms previous methods on regular and irregular text datasets.
More robust to imprecise localizations in text detection.
Effective recognition of arbitrary-shaped text.
Abstract
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem. Though achieving excellent performance, these methods usually neglect an important fact that text in images are actually distributed in two-dimensional space. It is a nature quite different from that of speech, which is essentially a one-dimensional signal. In principle, directly compressing features of text into a one-dimensional form may lose useful information and introduce extra noise. In this paper, we approach scene text recognition from a two-dimensional perspective. A simple yet effective model, called Character Attention Fully Convolutional Network (CA-FCN), is devised for recognizing the text of arbitrary shapes. Scene text recognition is realized with a semantic segmentation network, where an attention mechanism for characters is adopted.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
