Primitive Representation Learning for Scene Text Recognition

Ruijie Yan; Liangrui Peng; Shanyu Xiao; Gang Yao

arXiv:2105.04286·cs.CV·May 11, 2021·5 cites

Primitive Representation Learning for Scene Text Recognition

Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao

PDF

Open Access 4 Repos

TL;DR

This paper introduces a primitive representation learning approach for scene text recognition, utilizing graph-based feature modeling to improve accuracy and efficiency, especially in multi-oriented texts, and proposes an enhanced PREN2D framework with 2D attention.

Contribution

It presents a novel graph-based primitive representation learning method and an improved PREN2D framework for better scene text recognition performance.

Findings

01

PREN achieves a good balance between accuracy and efficiency.

02

PREN2D attains state-of-the-art results on English and Chinese datasets.

03

The proposed methods effectively handle multi-oriented scene texts.

Abstract

Scene text recognition is a challenging task due to diverse variations of text instances in natural scene images. Conventional methods based on CNN-RNN-CTC or encoder-decoder with attention mechanism may not fully investigate stable and efficient feature representations for multi-oriented scene texts. In this paper, we propose a primitive representation learning method that aims to exploit intrinsic representations of scene text images. We model elements in feature maps as the nodes of an undirected graph. A pooling aggregator and a weighted aggregator are proposed to learn primitive representations, which are transformed into high-level visual text representations by graph convolutional networks. A Primitive REpresentation learning Network (PREN) is constructed to use the visual text representations for parallel decoding. Furthermore, by integrating visual text representations into an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques