Shape Robust Text Detection with Progressive Scale Expansion Network
Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai, Shao

TL;DR
This paper introduces PSENet, a novel text detection method that accurately detects arbitrary-shaped text instances and effectively separates close text objects using progressive scale expansion.
Contribution
The paper proposes a new segmentation-based network that generates multi-scale kernels and progressively expands them to improve arbitrary-shaped text detection and separation of close instances.
Findings
Achieves 74.3% F-measure on CTW1500 at 27 FPS
Outperforms state-of-the-art algorithms by 6.6% F-measure
Validates effectiveness on multiple challenging datasets
Abstract
Scene text detection has witnessed rapid progress especially with the recent development of convolutional neural networks. However, there still exists two challenges which prevent the algorithm into industry applications. On the one hand, most of the state-of-art algorithms require quadrangle bounding box which is in-accurate to locate the texts with arbitrary shape. On the other hand, two text instances which are close to each other may lead to a false detection which covers both instances. Traditionally, the segmentation-based approach can relieve the first problem but usually fail to solve the second challenge. To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. More specifically, PSENet generates the different scale of kernels for each text instance, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling
