Single Shot Text Detector with Regional Attention
Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li

TL;DR
This paper introduces a single-shot text detection method using regional attention and hierarchical inception modules, achieving state-of-the-art accuracy on the ICDAR 2015 benchmark.
Contribution
It proposes a novel attention mechanism and hierarchical inception module for robust, single-scale, multi-orientation text detection in natural images.
Findings
Achieved 77% F-measure on ICDAR 2015 benchmark
Outperformed recent FCN-based text detectors
Effective at detecting small and multi-scale text
Abstract
We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accurate inference of words, particularly at extremely small sizes. This results in a single model that essentially works in a coarse-to-fine manner. It departs from recent FCN- based text detectors which cascade multiple FCN models to achieve an accurate prediction. Furthermore, we develop a hierarchical inception module which efficiently aggregates multi-scale inception features. This enhances local details, and also encodes strong context information, allow- ing the detector to work reliably on multi-scale and multi- orientation text with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Single Shot Text Detector with Regional Attention· youtube
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Video Analysis and Summarization
MethodsConvolution · 1x1 Convolution · Max Pooling · Inception Module · Fully Convolutional Network
