COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas and, Serge Belongie

TL;DR
The COCO-Text dataset provides a large, diverse collection of annotated natural images to advance text detection and recognition, highlighting current challenges and guiding future research in scene text understanding.
Contribution
This paper introduces the COCO-Text dataset with extensive annotations, enabling improved evaluation and development of scene text detection and recognition methods.
Findings
Over 173,000 text annotations in 63,000 images
Analysis of state-of-the-art OCR approaches reveals significant shortcomings
Dataset facilitates benchmarking and future research in natural scene text recognition
Abstract
This paper describes the COCO-Text dataset. In recent years large-scale datasets like SUN and Imagenet drove the advancement of scene understanding and object recognition. The goal of COCO-Text is to advance state-of-the-art in text detection and recognition in natural images. The dataset is based on the MS COCO dataset, which contains images of complex everyday scenes. The images were not collected with text in mind and thus contain a broad variety of text instances. To reflect the diversity of text in natural scenes, we annotate text with (a) location in terms of a bounding box, (b) fine-grained classification into machine printed text and handwritten text, (c) classification into legible and illegible text, (d) script of the text and (e) transcriptions of legible text. The dataset contains over 173k text annotations in over 63k images. We provide a statistical analysis of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
