A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification
Mahya Ameryan, Lambert Schomaker

TL;DR
This paper introduces a small ensemble of homogeneous CNN and LSTM networks with data augmentation and voting to achieve high accuracy in handwritten word recognition, outperforming previous methods on standard benchmarks.
Contribution
The paper presents a novel end-to-end convolutional LSTM network ensemble that effectively handles geometric and sequence variability in handwritten text recognition with minimal network size.
Findings
Achieved 96.6% accuracy on RIMES dataset
Ensemble of five networks outperforms state-of-the-art methods
Effective on both modern and historical handwritten datasets
Abstract
In recent years, long short-term memory neural networks (LSTMs) have been applied quite successfully to problems in handwritten text recognition. However, their strength is more located in handling sequences of variable length than in handling geometric variability of the image patterns. Furthermore, the best results for LSTMs are often based on large-scale training of an ensemble of network instances. In this paper, an end-to-end convolutional LSTM Neural Network is used to handle both geometric variation and sequence variability. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (Convolutional Neural Network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
