Synthetic Data for Text Localisation in Natural Images
Ankush Gupta, Andrea Vedaldi, Andrew Zisserman

TL;DR
This paper presents a scalable synthetic data generation engine and a deep learning-based text detection network that together significantly improve text localization in natural images, achieving high accuracy and real-time performance.
Contribution
The paper introduces a novel synthetic data engine for natural scene text and a deep learning model trained on this data for efficient, accurate text detection.
Findings
Achieved 84.2% F-measure on ICDAR 2013 benchmark.
Processed 15 images per second on GPU.
Outperformed existing text detection methods.
Abstract
In this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. Second, we use the synthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detection and bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning. The resulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Synthetic Data for Text Localisation in Natural Images· youtube
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
