Improving patch-based scene text script identification with ensembles of   conjoined networks

Lluis Gomez; Anguelos Nicolaou; Dimosthenis Karatzas

arXiv:1602.07480·cs.CV·February 2, 2017·1 cites

Improving patch-based scene text script identification with ensembles of conjoined networks

Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas

PDF

Open Access 1 Repo

TL;DR

This paper introduces an ensemble of conjoined networks for patch-based scene text script identification, effectively handling variable aspect ratios and achieving state-of-the-art results on public datasets.

Contribution

It proposes a novel ensemble of conjoined networks for patch-based script identification, addressing aspect ratio variability and improving accuracy over existing CNN methods.

Findings

01

Achieved state-of-the-art results on two public datasets

02

Demonstrated the effectiveness of patch-based classification for script identification

03

Showed the importance of script identification in end-to-end scene text reading systems

Abstract

This paper focuses on the problem of script identification in scene text images. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. Our experiments with this learning procedure demonstrate state-of-the-art results in two public script identification datasets. In addition, we propose a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lluisgomez/script_identification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Music and Audio Processing