Query by String word spotting based on character bi-gram indexing
Suman K. Ghosh, Ernest Valveny

TL;DR
This paper introduces a segmentation-free method for query-by-string word spotting in document images, utilizing character bi-gram indexing and attribute representations to improve search efficiency and accuracy.
Contribution
It presents a novel character bi-gram indexing approach combined with attribute models and integral image representation for efficient, segmentation-free word spotting.
Findings
Achieved state-of-the-art results on standard datasets
Demonstrated effectiveness of bi-gram indexing for fast retrieval
Improved retrieval performance with re-ranking step
Abstract
In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
