A Transformer-based Approach for Arabic Offline Handwritten Text Recognition
Saleh Momeni, Bagher BabaAli

TL;DR
This paper introduces Transformer-based architectures for offline Arabic handwritten text recognition, improving accuracy and parallelization over traditional RNN-based methods by leveraging attention mechanisms and pre-trained models.
Contribution
The paper proposes novel Transformer architectures for Arabic handwriting recognition, replacing RNNs, and demonstrates superior performance on benchmark datasets.
Findings
Outperforms state-of-the-art methods in accuracy
Offers faster processing due to parallelizable attention mechanisms
Effectively models language dependencies without external language models
Abstract
Handwriting recognition is a challenging and critical problem in the fields of pattern recognition and machine learning, with applications spanning a wide range of domains. In this paper, we focus on the specific issue of recognizing offline Arabic handwritten text. Existing approaches typically utilize a combination of convolutional neural networks for image feature extraction and recurrent neural networks for temporal modeling, with connectionist temporal classification used for text generation. However, these methods suffer from a lack of parallelization due to the sequential nature of recurrent neural networks. Furthermore, these models cannot account for linguistic rules, necessitating the use of an external language model in the post-processing stage to boost accuracy. To overcome these issues, we introduce two alternative architectures, namely the Transformer Transducer and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Linear Layer · Softmax · Layer Normalization · Dense Connections · Dropout · Focus · Position-Wise Feed-Forward Layer
