Zone-based Keyword Spotting in Bangla and Devanagari Documents
Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal

TL;DR
This paper introduces a zone-based keyword spotting system for Bangla and Devanagari scripts that leverages HMM-based zone segmentation and a novel foreground-background feature to improve recognition accuracy.
Contribution
It proposes a new zone segmentation method using HMMs and a combined foreground-background feature for better keyword spotting in Indic scripts.
Findings
Significant performance improvement over traditional methods
HMM-based zone segmentation effectively isolates script zones
Combined foreground-background features outperform individual features
Abstract
In this paper we present a word spotting system in text lines for offline Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown that zone-wise recognition method improves the word recognition performance than conventional full word recognition system in Indic scripts. Inspired with this idea we consider the zone segmentation approach and use middle zone information to improve the traditional word spotting performance. To avoid the problem of zone segmentation using heuristic approach, we propose here an HMM based approach to segment the upper and lower zone components from the text line images. The candidate keywords are searched from a line without segmenting characters or words. Also, we propose a novel feature combining foreground and background information of text line images for keyword-spotting by character filler models. A significant improvement in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
