A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
Wassim Swaileh, Yann Soullard, Thierry Paquet

TL;DR
This paper presents a unified multilingual handwriting recognition system using multigrams, reducing complexity and achieving state-of-the-art performance by employing sub-lexical units for language modeling.
Contribution
It introduces a novel approach using multigrams for language modeling, enabling a single optical and language model for multiple languages in handwriting recognition.
Findings
Achieves state-of-the-art recognition performance.
Significantly reduces language model complexity.
Supports end-to-end multilingual recognition.
Abstract
We address the design of a unified multilingual system for handwriting recognition. Most of multi- lingual systems rests on specialized models that are trained on a single language and one of them is selected at test time. While some recognition systems are based on a unified optical model, dealing with a unified language model remains a major issue, as traditional language models are generally trained on corpora composed of large word lexicons per language. Here, we bring a solution by con- sidering language models based on sub-lexical units, called multigrams. Dealing with multigrams strongly reduces the lexicon size and thus decreases the language model complexity. This makes pos- sible the design of an end-to-end unified multilingual recognition system where both a single optical model and a single language model are trained on all the languages. We discuss the impact of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
