Improving patch-based scene text script identification with ensembles of conjoined networks
Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas

TL;DR
This paper introduces an ensemble of conjoined networks for patch-based scene text script identification, effectively handling variable aspect ratios and achieving state-of-the-art results on public datasets.
Contribution
It proposes a novel ensemble of conjoined networks for patch-based script identification, addressing aspect ratio variability and improving accuracy over existing CNN methods.
Findings
Achieved state-of-the-art results on two public datasets
Demonstrated the effectiveness of patch-based classification for script identification
Showed the importance of script identification in end-to-end scene text reading systems
Abstract
This paper focuses on the problem of script identification in scene text images. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. Our experiments with this learning procedure demonstrate state-of-the-art results in two public script identification datasets. In addition, we propose a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Music and Audio Processing
