Enhancing Handwritten Text Recognition with N-gram sequence decomposition and Multitask Learning
Vasiliki Tassopoulou, George Retsinas, Petros Maragos

TL;DR
This paper introduces a multi-task learning approach for handwritten text recognition that uses n-gram sequence decomposition at training to improve accuracy, outperforming single-task models without extra inference cost.
Contribution
It proposes a novel multi-task training scheme with n-gram decomposition and two network architectures, enhancing recognition performance by leveraging implicit language modeling.
Findings
Outperforms single-task models by 2.52% WER and 1.02% CER.
Uses n-gram decomposition from unigrams to fourgrams during training.
Achieves better accuracy without additional inference computational cost.
Abstract
Current state-of-the-art approaches in the field of Handwritten Text Recognition are predominately single task with unigram, character level target units. In our work, we utilize a Multi-task Learning scheme, training the model to perform decompositions of the target sequence with target units of different granularity, from fine to coarse. We consider this method as a way to utilize n-gram information, implicitly, in the training process, while the final recognition is performed using only the unigram output. % in order to highlight the difference of the internal Unigram decoding of such a multi-task approach highlights the capability of the learned internal representations, imposed by the different n-grams at the training step. We select n-grams as our target units and we experiment from unigrams to fourgrams, namely subword level granularities. These multiple decompositions are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
