Detecting Text in Natural Image with Connectionist Text Proposal Network
Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao

TL;DR
The paper introduces a Connectionist Text Proposal Network (CTPN) that accurately detects text lines in natural images using a novel, end-to-end trainable deep learning model that integrates convolutional features with sequential proposals.
Contribution
It presents the CTPN model which combines a vertical anchor mechanism with a recurrent neural network for precise, multi-scale, multi-language text detection without complex post-processing.
Findings
Achieves high F-measure on ICDAR benchmarks
Operates efficiently at 0.14 seconds per image
Outperforms recent state-of-the-art methods
Abstract
We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multi- language text without further post-processing, departing from previous bottom-up methods requiring multi-step post-processing. It achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
