A Few Shot Multi-Representation Approach for N-gram Spotting in   Historical Manuscripts

Giuseppe De Gregorio; Sanket Biswas; Mohamed Ali Souibgui; Asma; Bensalah; Josep Llad\'os; Alicia Forn\'es; Angelo Marcelli

arXiv:2209.10441·cs.CV·September 22, 2022·1 cites

A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts

Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma, Bensalah, Josep Llad\'os, Alicia Forn\'es, Angelo Marcelli

PDF

Open Access

TL;DR

This paper introduces a few-shot learning method for N-gram spotting in historical manuscripts, reducing vocabulary dependency and improving recognition with limited labeled data.

Contribution

It proposes a novel multi-representation few-shot approach for N-gram spotting, addressing data scarcity in historical manuscript recognition.

Findings

01

Achieved promising results on Bentham's manuscripts

02

Reduced out-of-vocabulary word issues

03

Demonstrated effectiveness of few-shot N-gram spotting

Abstract

Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system's dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Text and Document Classification Technologies