TL;DR
This paper introduces a new balanced dataset for Bangla handwritten characters and proposes a hybrid deep learning model with multi-head attention for improved recognition accuracy.
Contribution
It presents a novel interaction-aware architecture combining EfficientNetB3, Vision Transformer, and Conformer modules with a multi-head cross-attention mechanism, along with a new dataset.
Findings
Achieved 98.84% accuracy on the new dataset
Achieved 96.49% accuracy on external benchmark
Demonstrated strong generalization and interpretability
Abstract
Character recognition is the fundamental part of an optical character recognition (OCR) system. Word recognition, sentence transcription, document digitization, and language processing are some of the higher-order activities that can be done accurately through character recognition. Nonetheless, recognizing handwritten Bangla characters is not an easy task because they are written in different styles with inconsistent stroke patterns and a high degree of visual character resemblance. The datasets available are usually limited in intra-class and inequitable in class distribution. We have constructed a new balanced dataset of Bangla written characters to overcome those problems. This consists of 78 classes and each class has approximately 650 samples. It contains the basic characters, composite (Juktobarno) characters and numerals. The samples were a diverse group comprising a large age…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
