An end-to-end TextSpotter with Explicit Alignment and Attention

Tong He; Zhi Tian; Weilin Huang; Chunhua Shen; Yu Qiao; Changming Sun

arXiv:1803.03474·cs.CV·March 26, 2018·31 cites

An end-to-end TextSpotter with Explicit Alignment and Attention

Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

PDF

Open Access 2 Repos

TL;DR

This paper introduces an end-to-end framework for text detection and recognition in natural images, utilizing explicit alignment and attention mechanisms to improve accuracy and efficiency.

Contribution

It presents a novel text-alignment layer, a character attention mechanism with explicit supervision, and integrates these with a new RNN branch into a unified, trainable model.

Findings

01

Achieved state-of-the-art end-to-end recognition results on ICDAR2015.

02

Significant improvements in F-measure over previous methods.

03

Model also sets new benchmarks in text detection performance.

Abstract

Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Training of two tasks in a unified framework is non-trivial due to significant dif- ferences in optimisation difficulties. In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in one shot. Our main contributions are three-fold: 1) we propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in ar- bitrary orientation, which is the key to boost the per- formance; 2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition; 3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction