Towards Unconstrained End-to-End Text Spotting
Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying, Xiao

TL;DR
This paper introduces an end-to-end trainable network capable of detecting and recognizing arbitrarily shaped scene text, significantly advancing the ability to read irregular text in images.
Contribution
It formulates irregular shape text detection as an instance segmentation problem and employs an attention model for recognition without rectification, improving accuracy on benchmarks.
Findings
Surpassed state-of-the-art on ICDAR15 by 4.6%.
Achieved over 16% improvement on Total-Text.
Introduced RoI masking for irregular text feature extraction.
Abstract
We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Additionally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction
