Single Shot Text Detector with Regional Attention

Pan He; Weilin Huang; Tong He; Qile Zhu; Yu Qiao; Xiaolin Li

arXiv:1709.00138·cs.CV·September 4, 2017·58 cites

Single Shot Text Detector with Regional Attention

Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a single-shot text detection method using regional attention and hierarchical inception modules, achieving state-of-the-art accuracy on the ICDAR 2015 benchmark.

Contribution

It proposes a novel attention mechanism and hierarchical inception module for robust, single-scale, multi-orientation text detection in natural images.

Findings

01

Achieved 77% F-measure on ICDAR 2015 benchmark

02

Outperformed recent FCN-based text detectors

03

Effective at detecting small and multi-scale text

Abstract

We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accurate inference of words, particularly at extremely small sizes. This results in a single model that essentially works in a coarse-to-fine manner. It departs from recent FCN- based text detectors which cascade multiple FCN models to achieve an accurate prediction. Furthermore, we develop a hierarchical inception module which efficiently aggregates multi-scale inception features. This enhances local details, and also encodes strong context information, allow- ing the detector to work reliably on multi-scale and multi- orientation text with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BestSonny/SSTD
none

Videos

Single Shot Text Detector with Regional Attention· youtube

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Video Analysis and Summarization

MethodsConvolution · 1x1 Convolution · Max Pooling · Inception Module · Fully Convolutional Network