A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images
E. Vidal, A.H. Toselli, J. Puigcerver

TL;DR
This paper introduces a probabilistic framework for lexicon-based keyword spotting in handwritten text images, enabling efficient indexing and search without relying on pre-existing transcripts, and demonstrates superior results on multiple datasets.
Contribution
It presents a novel probabilistic approach that leverages handwriting recognition models for keyword spotting, improving accuracy and applicability in large handwritten document collections.
Findings
Significantly better results on benchmark datasets.
Effective indexing and search in large handwritten collections.
Potential for generalization to various handwritten text datasets.
Abstract
Query by String Keyword Spotting (KWS) is here considered as a key technology for indexing large collections of handwritten text images to allow fast textual access to the contents of these collections. Under this perspective, a probabilistic framework for lexicon-based KWS in text images is presented. The presentation aims at providing a tutorial view that helps to understand the relations between classical statements of KWS and the relative challenges entailed by these statements. More specifically, the development of the proposed framework makes it self-evident that word recognition or classification implicitly or explicitly underlies any formulation of KWS. Moreover, it clearly suggests that the same statistical models and training methods successfully used for handwriting text recognition can advantageously be used also for KWS, even though KWS does not generally require or rely on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Music and Audio Processing · Image Retrieval and Classification Techniques
