A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts
Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma, Bensalah, Josep Llad\'os, Alicia Forn\'es, Angelo Marcelli

TL;DR
This paper introduces a few-shot learning method for N-gram spotting in historical manuscripts, reducing vocabulary dependency and improving recognition with limited labeled data.
Contribution
It proposes a novel multi-representation few-shot approach for N-gram spotting, addressing data scarcity in historical manuscript recognition.
Findings
Achieved promising results on Bentham's manuscripts
Reduced out-of-vocabulary word issues
Demonstrated effectiveness of few-shot N-gram spotting
Abstract
Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system's dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Text and Document Classification Technologies
