Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks

Yuan Li; Yuanjie Yu; Zefeng Li; Yangkun Lin; Meifang Xu; Jiwei Li; Xi; Zhou

arXiv:1811.07432·cs.CV·November 20, 2018·27 cites

Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks

Yuan Li, Yuanjie Yu, Zefeng Li, Yangkun Lin, Meifang Xu, Jiwei Li, Xi, Zhou

PDF

Open Access

TL;DR

Pixel-Anchor is a novel end-to-end deep neural network that combines semantic segmentation and SSD to efficiently detect oriented scene text with high accuracy and speed, outperforming existing methods on standard benchmarks.

Contribution

The paper introduces Pixel-Anchor, a unified network integrating semantic segmentation and SSD with feature sharing and attention, improving scene text detection accuracy and efficiency.

Findings

01

Achieves an F-score of 0.8768 on ICDAR 2015

02

Runs at 10 FPS for high-resolution images

03

Outperforms existing methods in accuracy and speed

Abstract

Recently, semantic segmentation and general object detection frameworks have been widely adopted by scene text detecting tasks. However, both of them alone have obvious shortcomings in practice. In this paper, we propose a novel end-to-end trainable deep neural network framework, named Pixel-Anchor, which combines semantic segmentation and SSD in one network by feature sharing and anchor-level attention mechanism to detect oriented scene text. To deal with scene text which has large variances in size and aspect ratio, we combine FPN and ASPP operation as our encoder-decoder structure in the semantic segmentation part, and propose a novel Adaptive Predictor Layer in the SSD. Pixel-Anchor detects scene text in a single network forward pass, no complex post-processing other than an efficient fusion Non-Maximum Suppression is involved. We have benchmarked the proposed Pixel-Anchor on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Vehicle License Plate Recognition

MethodsConvolution · Non Maximum Suppression · Dilated Convolution · Spatial Pyramid Pooling · 1x1 Convolution · Atrous Spatial Pyramid Pooling · SSD