Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel   Aggregation Network

Wenhai Wang; Enze Xie; Xiaoge Song; Yuhang Zang; Wenjia Wang; Tong Lu,; Gang Yu; Chunhua Shen

arXiv:1908.05900·cs.CV·August 4, 2020·75 cites

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu,, Gang Yu, Chunhua Shen

PDF

Open Access 5 Repos

TL;DR

This paper introduces the Pixel Aggregation Network (PAN), an efficient and accurate method for detecting arbitrary-shaped text in scenes, balancing speed and precision with a novel segmentation and post-processing approach.

Contribution

The paper proposes a new text detection framework combining a low-cost segmentation head and learnable post-processing for improved speed and accuracy.

Findings

01

Achieves 79.9% F-measure on CTW1500

02

Runs at 84.2 FPS, demonstrating real-time capability

03

Outperforms existing methods on standard benchmarks

Abstract

Scene text detection, an important step of scene text reading systems, has witnessed rapid development with convolutional neural networks. Nonetheless, two main challenges still exist and hamper its deployment to real-world applications. The first problem is the trade-off between speed and accuracy. The second one is to model the arbitrary-shaped text instance. Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications.In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing. More specifically, the segmentation head is made up of Feature Pyramid Enhancement Module (FPEM) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings