Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network
Tong He, Weilin Huang, Yu Qiao, Jian Yao

TL;DR
This paper presents a cascaded convolutional network approach for accurate, efficient scene text detection in natural images, achieving high accuracy and robustness across languages and orientations.
Contribution
The novel CCTN framework combines coarse-to-fine localization with customized convolutions, improving robustness and efficiency over previous methods.
Findings
Achieves 0.84 and 0.86 F-measure on ICDAR 2011 and 2013 datasets.
Outperforms state-of-the-art results significantly.
Handles multi-shape, multi-scale, and multi-language text effectively.
Abstract
We introduce a new top-down pipeline for scene text detection. We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization. The CCTN fast detects text regions roughly from a low-resolution image, and then accurately localizes text lines from each enlarged region. We cast previous character based detection into direct text region estimation, avoiding multiple bottom- up post-processing steps. It exhibits surprising robustness and discriminative power by considering whole text region as detection object which provides strong semantic information. We customize convolutional network by develop- ing rectangle convolutions and multiple in-network fusions. This enables it to handle multi-shape and multi-scale text efficiently. Furthermore, the CCTN is computationally efficient by sharing convolutional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction
