An end-to-end TextSpotter with Explicit Alignment and Attention
Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

TL;DR
This paper introduces an end-to-end framework for text detection and recognition in natural images, utilizing explicit alignment and attention mechanisms to improve accuracy and efficiency.
Contribution
It presents a novel text-alignment layer, a character attention mechanism with explicit supervision, and integrates these with a new RNN branch into a unified, trainable model.
Findings
Achieved state-of-the-art end-to-end recognition results on ICDAR2015.
Significant improvements in F-measure over previous methods.
Model also sets new benchmarks in text detection performance.
Abstract
Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Training of two tasks in a unified framework is non-trivial due to significant dif- ferences in optimisation difficulties. In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in one shot. Our main contributions are three-fold: 1) we propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in ar- bitrary orientation, which is the key to boost the per- formance; 2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition; 3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction
