TextBoxes: A Fast Text Detector with a Single Deep Neural Network

Minghui Liao; Baoguang Shi; Xiang Bai; Xinggang Wang; Wenyu Liu

arXiv:1611.06779·cs.CV·November 22, 2016·445 cites

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu

PDF

Open Access 3 Repos

TL;DR

TextBoxes introduces a fast, end-to-end deep neural network for scene text detection that achieves high accuracy and efficiency, outperforming existing methods in speed and localization precision, and enhancing end-to-end recognition tasks.

Contribution

The paper proposes a novel single-network architecture for scene text detection that is both fast and accurate, with minimal post-processing.

Findings

01

Achieves 0.09s per image detection speed

02

Outperforms state-of-the-art in text localization accuracy

03

Significantly improves end-to-end text recognition results

Abstract

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms state-of-the-art approaches on word spotting and end-to-end text recognition tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Music and Audio Processing · Advanced Image and Video Retrieval Techniques